Functional Encapsulation and Type Reconstruction in a 
Strongly-typed, Polymorphic Language 



Shail Aditya Gupta 



MIT / LCS / TR-647 
February 1995 



© Shail Aditya 1995 

The author hereby grants to MIT permission to reproduce and to 
distribute copies of this technical report in whole or in part. 



This report describes research done at the Laboratory of Computer Science of the Massachusetts 
Institute of Technology. Funding for this work has been provided in part by the Advanced 
Research Projects Agency of the Department of Defense under the Office of Naval Research 
contract N00014-92-J-1310. 

This report was originally published as the author's doctoral dissertation. 



Functional Encapsulation and Type Reconstruction in a 
Strongly-typed, Polymorphic Language 

Shail Aditya Gupta 

Technical Report MIT / LCS / TR-647 
February 1995 

MIT Laboratory for Computer Science 
545 Technology Square 
Cambridge MA 02139 



Abstract 

Static type systems are traditionally used to prevent run-time type-errors in user programs 
and to assign appropriate storage representations to objects during compilation. In this thesisr 
we explore some new ways of using static type information in the designl" compilationl" and 
execution of programs written in a strongly-typedrpolymorphic language. 

Programmers often find it useful to know whether or not a particular data-structure may be 
updated outside a given control block. Information about an object's non-mutability helps 
compiler optimizations!" improves aliasing and dependence analysesFand permits unrestricted 
caching of functional data at run-time. In the first part of this thesisFwe present a safeFstatic 
mechanism for functional encapsulation of imperative data-structures using a powerful type 
system based on closure types and regions. We introduce a new language construct called 
close which delimits the scope of side-effects on imperative objects and converts them into 
functional objects outside that scope. This mechanism may be used to build efncientr high- 
leveirfunctional data-abstractions within a language using its low-leveirimperative constructs. 
Type-safety and non-mutability of closed objects is guaranteed by a semantic soundness theorem 
that ensures consistency between the static and the dynamic semantics. The type system is 
presented in the context of Idrwhich is a strongly-typedrpolymorphicrhigher-order languager 
and it easily simplifies to a first-orderFmonomorphic language such as C or Fortran. 

In the second part of the thesisFwe develop a generaircompiler-directed methodology for com- 
plete type reconstruction of run-time objects in a polymorphic language without using any 
run-time type-tags. Run-time type reconstruction is carried out by instantiating static type 
information for each function activation frame present within the dynamic call tree. Additional 
type-hints are inserted automatically at compile-time and are decoded at run-time to ensure 
complete type reconstruction. We present the necessary compiler analysis and the type re- 
construction algorithm and prove their correctness. This technique has been used successfully 
for displaying run-time objects within the Id source debugger for Monsoon and to perform 
tagless garbage collection in the *T architecture. We describe the latter application in detaiir 
comparing its performance with other schemes for automatic storage reclamation. 

Key Words and Phrases: Functional EncapsulationFOperational SemanticsFPolymorphismr 
Type Soundnessrimperative TypingrClosure TypingFRegionsFTyped Run-time SystemFType 
ReconstructionrType ConservationFType-HintsFTagless Garbage Collection. 



Acknowledgments^ 



I am most grateful to my thesis advisor and my mentorrProf. Arvindrwithout whose guidance 
and encouragement I would not have seen this day. He showed a keen interest in me when I first 
joined MIT in the fall of 1987 and took his course on Dataflow and Reduction Architectures 
(6.847). I was highly impressed by his magnetic personalityr a clear vision of the future of 
parallel computingFand a strong conviction for achieving that research goal. After seven yearsr 
I still remain deeply impressedrand in many waysFgreatly influenced by his personality and 
research ideas. I thank him for giving me the opportunity to work with him and with the other 
members of the Computation Structures GroupFwhich for the last seven yearsrhas been a truly 
exciting and friendly research atmosphere to work in. 

I would also like to thank the other members of my thesis committeeFProf. Albert R. Meyer 
and Dr. Rishiyur S. Nikhiirwho provided valuable advice and guidance from time to time and 
helped me get through my thesis defense with ease. I am especially grateful to Dr. Nikhil who 
previously supervised my Master's thesis. He helped me shape my graduate academic career 
and has been a continuous source of support and inspiration. 

Xavier Feroy from INRIAFand Satyan Coorg from CSG gave enormous technical help while 
developing the type system presented in the first part of this thesis. Prof. Arvind provided 
valuable insights into the language design issues that led to the current design of the close 
construct. James Hicks from MCRC and Christine Flood from CSG helped in implementing 
and comparing various storage reclamation schemes described in Chapter 8. I sincerely thank 
them all for their time and patience in helping me complete this research. 

I heartily thank all the members of the Computation Structures Groupr both past and 
presentrfor their continuing friendship and supportrmaking it all feel like a big family. I have 
made some of the best friends of my life in this group both professionally and personally. I 
would especially like to thank Zena Ariola for providing moral support and sound adviceFand 
Michael Halbherr for his delightful friendship and a wonderful time in Europe. 

I would also like to thank GitaFProf. Arvind's wifer whose love and care for merhome 
cooked foodrand invitations to participate in family festivities really provided me with the 
feeling of a "home away from home" . 

I also thank all my other friends at MIT and elsewhereFand members of the music group 
Gunjan for their enjoyable company and memorable experiencesr creating welcome diversions 
from work and making these past several years some of the most cherished moments of my life. 

I am eternally grateful for the love and affection bestowed upon me from my family. The 
encouragement and support I received from my parentsFVidyaratna and Kusumris beyond 
measure. This thesis is as much a fulfillment of their dream as it is of mine. It is the fruit of 
their tremendous confidence in my abilities under all circumstances. I am very grateful to my 
brothersrVikram and Udayrfor their love and support and who took all my responsibilities 
at home upon themselves. I am also extremely grateful to my sistersFArchana and Kalpanar 
who provided me with enormous love and affection as well as strong moral support and sound 
advice during difficult times. 

Finallyn am thankful beyond words to The Almighty God for keeping me steady on my 
pathrgiving me the strength and will power to do the right thing every timerhelping me to be 
at peace with myself in the face of sorrow or joyFand ultimately making my lifelong academic 
dream come true. May He continue to guide my path in the same way in future. Amen. 



^Funding for this work has been provided in part by the Advanced Research Projects Agency of the Depart- 
ment of Defense under the Office of Naval Research contract N000f4-92-J-f3f0. 



To my parents, 
Vidyaratna and Kusum. 



Contents 



Abstract 3 

Acknowledgments 5 

1 Introduction 15 

1.1 Layered Language Design 15 

1.2 Id: A Strongly- TypedLLayered Programming Language 17 

1.3 Applications to ConventionalLImperative Languages 18 

1.4 Outline of the Thesis 19 

1.4.1 Part I 19 

1.4.2 Part II 20 

1.4.3 How to Read this Thesis 20 

1 Types in Language Design: Functional Encapsulation 23 

2 Functional Encapsulation of Imperative Data-Structures 25 

2.1 The Problem 25 

2.1.1 Abstraction and Polymorphism 26 

2.1.2 Outline of our Approach 27 

2.2 Imperative Type Systems 28 

2.2.1 Simple Hindley/Milner Type Inference 28 

2.2.2 Type System of Standard ML 29 

2.2.3 Type System of Standard ML of New Jersey 30 

2.2.4 Limitations of the Standard ML Type Systems 31 

2.2.5 Effect Systems 32 

2.2.6 Syntactic Closure Typing System 35 

2.2.7 Choosing an Imperative Type System 37 

2.3 Closing Imperative Data-Structures 38 

2.3.1 A Proposal for "Close" 38 

2.3.2 Guaranteeing Type-Safety 39 

2.3.3 Guaranteeing Non-Mutability 40 

2.3.4 Efficiency and Parallelism 41 

2.3.5 Termination of Side-Effects before "Close" 42 

2.4 Sound Typings for Imperative/Closed Objects 45 

2.4.1 Modeling "Imperativeness" in Types 45 

2.4.2 Handling the Environment 46 

2.4.3 Handling Structured Results 47 



2.4.4 Handling Functions 48 

2.5 Summary 49 

Semantics of "Close" 51 

3.1 Kernel Expression Language 51 

3.1.1 Expression Syntax 51 

3.1.2 Dynamic Semantics 52 

3.1.3 Properties of the Evaluation Rules 57 

3.2 A Closure Typing System 61 

3.2.1 Type Syntax 61 

3.2.2 Static Semantics 63 

3.2.3 Properties of the Typing Rules 66 

3.3 Type Soundness 68 

3.3.1 Semantic Model 68 

3.3.2 Properties of the Semantic Model 69 

3.3.3 Type Soundness 70 

3.4 Type Inference 79 

Closing Data-Structures 81 

4.1 Specification of "Close" for Multi-Level Data-Structures 81 

4.1.1 Dynamic Semantics Issues 82 

4.1.2 Static Semantics Issues 83 

4.1.3 Combining Type Generalization and Closing 84 

4.1.4 Discussion 85 

4.1.5 Closing a Fixed Set of Regions/Locations 86 

4.1.6 Type Annotations as "Close" Specifications 87 

4.2 Closing Arrays 89 

4.2.1 Dynamic Semantics 89 

4.2.2 Static Semantics 90 

4.2.3 Semantic Model and Soundness 91 

4.2.4 Modeling I-Structure and M-Structure Arrays 92 

4.3 Closing General Algebraic Datatypes 93 

4.3.1 Specification Issues 93 

4.3.2 Syntactic Specification of Algebraic Datatypes 95 

4.3.3 Dynamic Semantics 96 

4.3.4 Static Semantics 98 

4.3.5 Soundness 99 

4.4 Functional Encapsulation in Conventional Languages 99 

4.5 Conclusions 100 

4.5.1 Summary of Part I 100 

4.5.2 Implementation Status 101 

4.5.3 Future Work 101 



II Types in Run-time System Design: Type Reconstruction 103 



10 



A Typed Run-time System 105 

5.1 Introduction 105 

5.2 Design Issues for a Typed Run-time System 105 

5.2.1 Strong vs. Weak Typing 106 

5.2.2 Static vs. Dynamic Typing 106 

5.2.3 Tagged vs. Untagged Object Model 107 

5.2.4 Type Maintenance vs. Type Reconstruction 107 

5.2.5 Polymorphism and Higher-order Functions 108 

5.2.6 Type Inference vs. Type Declaration 108 

5.3 Our Approach 108 

5.4 Applications of Complete Run-time Type Reconstruction 109 

5.4.1 Polymorphic Source Debugging 109 

5.4.2 Tagless Garbage Collection 109 

5.4.3 Object-based I/O 110 

5.5 Outline Ill 

Compiler-directed Polymorphic Type Reconstruction 113 

6.1 Type Reconstruction Problem 113 

6.1.1 Basic Type Reconstruction Scheme 114 

6.1.2 Problems with Closures and Free Variables 116 

6.1.3 Discussion 117 

6.2 Type Reconstruction Framework 118 

6.2.1 Run-time Model of Program Execution 118 

6.2.2 Type Reconstructibility 119 

6.2.3 Recording Compile-time Type Information 120 

6.2.4 The Principle of Type Conservation 120 

6.3 Compiler Support for Type Reconstruction 123 

6.3.1 Detecting Violations of Type Conservation 123 

6.3.2 Propagating Non-Conserved Type Information across Functions 124 

6.3.3 Program Translation 125 

6.4 Run-time Type Reconstruction 126 

6.4.1 A Type Reconstruction Example 127 

6.5 Compiler Optimizations 128 

6.5.1 Rearranging the Hint Parameters 128 

6.5.2 Arity Analysis 129 

6.5.3 Escape Analysis 129 

6.5.4 Tail Calls 130 

6.5.5 Type Specialization 130 

6.6 Implementation Status 131 

6.6.1 Type Reconstruction in a Polymorphic Source Debugger 131 

6.6.2 Type Reconstruction for Tagless Garbage Collection 132 

Formal Framework for Run-time Type Reconstruction 133 

7.1 The Kernel Id Intermediate Language 133 

7.2 Compiler Support for Type Reconstruction 134 

7.2.1 A Type System for Computing Type-hints 134 

7.2.2 Type Inference 137 

7.2.3 Program Translation and Type-Hint Generation 137 



11 



7.2.4 Discussion 140 

7.3 Run-time Type Reconstruction 142 

7.3.1 Type Reconstruction Requirements 142 

7.3.2 The Reconstruction Algorithm 142 

7.3.3 Reconstruction Complexity 144 

7.4 Correctness of the Type Reconstruction Algorithm 145 

7.4.1 Simple Expression Language and its Semantic Model 146 

7.4.2 Partial Execution and the Dynamic Activation Tree 146 

7.4.3 Type Reconstruction 149 

7.4.4 The Type Reconstruction Algorithm 150 

7.4.5 Correctness of the Algorithm 151 

8 Application Study: Tagless Garbage Collection 155 

8.1 Introduction 155 

8.1.1 Storage Reclamation without Run-time Type Information 156 

8.1.2 Garbage Collection using Run-time Type Reconstruction 156 

8.1.3 Related Work 157 

8.1.4 Goals and Scope of the Study 157 

8.1.5 Outline 158 

8.2 Framework for Tagless Garbage Collection 158 

8.2.1 Object Representations and the Memory Model 158 

8.2.2 Overall Strategy 160 

8.3 Compiler Support for Object Identification 160 

8.3.1 Visible and Invisible Datatypes 160 

8.3.2 Modeling Function Closures 160 

8.3.3 Modeling Activation Frames 161 

8.3.4 Run-time Type Encodings 162 

8.4 Run-time Object Traversal and Marking 163 

8.4.1 Interpreted Marking 163 

8.4.2 Compiled Marking 165 

8.4.3 Variations on Marking Schemes 166 

8.5 *T Implementation 166 

8.5.1 Multi-threaded Execution: Processor View 167 

8.5.2 Multi-threaded Execution: System View 167 

8.5.3 Memory Organization 168 

8.5.4 Garbage Collection on *T 170 

8.6 Performance Results and Analysis 172 

8.6.1 Benchmark Runs 172 

8.6.2 Performance Analysis 175 

8.7 Conclusions 178 

8.7.1 Future Work 178 

Bibliography 181 



12 



List of Figures 



1.1 The Layered Design of the Id Language 18 

2.1 Conversions among Synchronization Protocols at the time of Closing 44 

3.1 The Dynamic Semantics of the Kernel Expression Language 55 

3.2 The Static Semantics of the Kernel Expression Language 65 

4.1 Dynamic Semantics of Arrays 90 

5.1 Design Issues for a Typed Run-time System 106 

6.1 The Run-time State of Computation in Example 6.1 115 

6.2 The Parallel Execution Model for Id 119 

6.3 Kernel Id definition and the Type-map of map function 121 

6.4 Visible and Invisible Application Sites 122 

6.5 The Kernel Id definition and type-map of function h3 from Example 6.3 124 

6.6 The Run-time State of Computation in Example 6.8 127 

7.1 The Kernel Id Intermediate Language 134 

7.2 Rules for computing Non-Conserved Type Information for Kernel Id Programs. . 135 

7.3 Encoding and Decoding of Type Schemes 138 

7.4 Program Translation and Hint Generation Rules 139 

7.5 The Type Reconstruction Algorithm 143 

7.6 The Evaluation Derivation Tree for Example 7.3 147 

8.1 Run-time Object Representations for Id 159 

8.2 Automatic Derivation of Invisible Datatypes 162 

8.3 Generating Mark Functions for Datatypes 164 

8.4 Type-code Interpretation at Run-time 164 

8.5 Type-based Translation at Compile-time 165 

8.6 The Organization of Computation Nodes and Memory Nodes in the *T machine. 169 

8.7 Performance Results for Quicksort and Paraffins 173 

8.8 Performance Results for Gamteb and Wavefront 174 

8.9 Total Cost and Run-time System Cost for the Benchmarks 175 

8.10 Run-time System Cost Breakup 176 



13 



14 



Chapter 1 

Introduction 



One of the main goals of modernrhigh-level programming languages is to provide an intuitive 
programming model that is useful in writing applications and reasoning about their behav- 
ior. Some languages enforce a style of programming that guarantees useful properties for the 
programs written in those languages. In this thesisFwe concentrate on a class of high-level 
languages that are strongly-typed. Strong-typing enforces type-consistency which imparts a de- 
gree of robustness to the program. A type-consistent program is guaranteed never to run into 
a run-time type-errorr e.g.Y attempting to use an integer in a floating-point computation or 
applying a non-function object to an argument. 

In a strongly-typed languagertype-consistency can be enforced during compilation {static 
typing) or during execution {dynamic typing). The compiler for a statically-typed language has 
to be somewhat conservative in enforcing type-consistency: it may reject certain programs that 
appear to be inconsistentralthough such programs may not encounter a run-time type-error for 
certain inputs (or even for all possible inputs). The advantage of being conservative is that a 
program that has been statically determined to be type-consistentris guaranteed to execute in 
a type-consistent manner for all inputs. ThereforeFno checks for type-consistency need to be 
made during its execution. 

Type information is primarily used in statically-typed languages to check for type-consistency 
within programs and to choose memory representations for data-structures. Most of this in- 
formation is thrown away once a program has been compiled. At mostrsome type information 
may be saved in a symbol table to be used by a source-level debugger. In this thesisFwe wish 
to explore more fundamental ways in which to incorporate and use type information in the 
designrcompilationFand execution of a program written in a strongly-typed language. We wish 
to use the type system of our language as a tool for structuring the language design into tight 
abstraction layersFprovide support for compiler optimizations and automatic code generation 
as well as support for run-time facilities such as source-level debugging and garbage collection. 

1.1 Layered Language Design 

Modernrhigh-level languages offer a variety of data and control abstraction mechanisms to 
enable users to structure their applications properly. Most programming language designs 
fall into one of the following two categories: either a language includes a large repertoire of 
common datatypes and their manipulation functions as part of its definition as in Common 
Fisp [SJ90]ror these objects are defined separately in a standard prelude or in system and user 
libraries as in the case of Standard MF [MT9irMTH90]rHaskell [HWe90]ror C [Pla92]. The 



15 



first approach sometimes leads to language definitions that may be too large to understandr 
implement and reason about. The second approach usually leads to small and simple language 
kernels that may be used to "implement" high-level datatypes and their associated functions 
as independent libraries. This approach seems better in terms of overall ease of understanding 
and maintainability of the languagerthough it requires a careful design of the libraries and their 
user interface. 

The recent success of strongly-typedrpolymorphicr functional languagesFsuch as Haskell 
and Standard MLrhighlight the importance of this layered approach to language design. Small 
language kernels have manageable semantic complexity and can be subjected to powerful rea- 
soning techniques. At the same timer a small set of kernel primitives that can be suitably 
mapped to the underlying architecture provide a flexible and efficient means of implementing 
pre-defined and user-defined high-level datatypes. In order for such kernel implementations to 
be sound and transparent to the end-userFa proper data and type abstraction mechanism must 
be provided in the kernel language. Otherwiserthe semantic correctness of the implementation 
may be in doubt. An example of this situation is the C language [KR88] which offers complete 
flexibility of a low-level kernel language but lacks a tight abstraction mechanismrieading some- 
times to subtle errors in user programs. For this reasonFmany high-level languages offer only a 
fixed set of high-level constructs with pre-defined semantics rather than provide the user with 
the complete flexibility and the raw power of a low-level kernel language. The list comprehen- 
sion mechanismrfirst introduced in NPF [Bur77rDar77] and later adopted in Miranda [Tur85] 
and Haskeliris an example of such a language construct. 

From a language design standpointra powerful type system can be used to enforce the type 
abstraction desired for kernel language implementations of high-level datatypes in libraries 
without changing the high-level language definition or modifying the compiler. In Part I of this 
thesisFwe are going to present a type system that will allow us to build a data-structure in a 
low-leveir imperative style and then safely encapsulate it as a functional data-structure. The 
motivation for doing so is as follows. 

Firstrour type system reduces the complexity of writing compilers for functional languages. 
Functional syntactic constructsFsuch as list and array comprehensionsrthat have to be im- 
plemented within the compiler as primitive constructsFcan instead be desugared into ordinary 
functions that are implemented in an independent system library. This is possible because we 
allow the programmer to use low-level imperative constructs while implementing the library 
that are safely encapsulated within functional abstractions provided by the type system. This 
approach is also very flexible since it allows modification and extension of existing language 
constructs as well as addition of new constructs without disturbing the bulk of the compiler. 

Secondrour type system provides a way to safely implement functional computations using 
imperative algorithms that cannot otherwise be expressed in a functional style efficiently. No- 
table examples that have this characteristic are accumulation (histogramming) algorithms and 
graph algorithms. Althoughrthe final result may be functionairthe computation often needs to 
be performed in an imperative way in order to achieve efficiency in space and time. Using our 
type systemFan imperative computation can be safely embedded within a functional program 
while still preserving its clean semantics and simple reasoning. 

From the standpoint of a compilerFworking hand-in-hand with a powerful type system can 
prove to be more fruitful than working around itras most compilers tend to do. Static types of 
program fragments provide valuable information about "what" is being computed. The shape 
and size of data-structures and the input/output parameters of functions can be determined 
using their static types. Intelligent compilers can use this information while performing impor- 
tant optimizations such as boxing/unboxing of dataFcode specializationFand register allocation. 



16 



Unfortunatelyrvery few compilers actually propagate the full source type information all the 
way to the back-enaTthe Glasgow Haskell compiler [PJ92] being a notable exception. In a lay- 
ered languagerthis task is considerably simplified since only a small number of kernel language 
constructs are involved within the later phases of the compiler. 

It is also possible to use source type information at run-time to display objects during ex- 
ecutionTor to output them to a fileFor to perform garbage collection. A run-time system that 
has access to complete source type information from the compiler may not need to maintain 
such information independentlyl" say in the form of object type-tagsrin order to handle such 
applications. The compiler and the run-time system could be made to cooperate in automat- 
ically recreating and using this type information when needed. In Part II of this thesisFwe 
will explore the technique of run-time type reconstruction that reconstructs the exact type of 
every object on demand without paying the overhead of type maintenance. FurthermoreFwe 
will explore ways in which static type information can be used to automatically generate spe- 
cialized routines at compile-time for each data and control object within the program in order 
to perform such tasks. 

1.2 Id: A Strongly- Typed, Layered Programming Language 

The idea of using type information within the design of a high-level languagerits compilerFor 
its run-time system is not new. Butrvery few systems make use of source type information 
right from the design of an application all the way down to its execution in a coherent manner. 
This research is geared towards such an integrated approach to managing type information in 
the context of the parallel programming language Id [Nik91]T developed at the Computation 
Structures GrouprLaboratory for Computer SciencerMIT. 

Id is a high-leveirstrongly-typed language and it uses the Hindley/Milner polymorphic type 
system and its automatic type inference mechanism [Mil78rDM82] at its functional core. Id 
also offers imperative data-structures (I-structures [ANP89] and M-structures [BNA91]) that 
cater to imperative styles of programming. Id is a layered language by design (see Figure 1.1). 
The language and its implementation can be divided into three distinct layers: the user-level 
functional layerrthe system-level imperative layerFand the architecture-level implementation 
layer. 

At the highest level of functionalityrthe Id language provides high-level constructs such as 
arraysriistsrtuplesrhigher-order functionsFand user-defined algebraic types. Special syntactic 
constructsFsuch as array and list comprehensions and pattern matching are also provided. 
Applications manipulating these objects make use of system and user libraries that support or 
extend the functionality provided by the compiler. 

The system-level layer consists of the Id kernel language. The primitive I-structure and 
M-structure datatypes provide the basic data-structuring and synchronized memory access 
mechanisms in this language. These primitive datatypes are used to represent all high-level 
data-structures. Foops and procedures constitute the basic control mechanism. The compiler 
translates high-level syntactic constructs such as pattern matchingFand list and array compre- 
hensions into primitive operations on kernel datatypes. The system and user libraries may also 
make use of these kernel constructs to implement high-level data-structures. 

Finallyrthe architecture-level layer consists of the run-time system of the language and is 
responsible for implementing the Id execution model and managing the synchronized memory. 
The compiler also generates type information and run-time support code for garbage collection 
and source-level debugging that can be directly linked along with the object code to perform 



17 



Id Language 




Library and 
Compiler Support 



List and Array Comprehensions 
Algebraic Types, Higher-order 
Functions, Pattern Matching 



Primitive Scalar Datatypes 
l-Structures and M-Structures 
Loops and Procedures 



User-level 



System-level 



User Libraries 



System Libraries 



i 



Run-Time System Architecture-level 



I 



Application 
Object Code 



Heap and Frame Manager, 
Multi-threaded Scheduler 
Support for GC, I/O, Debugger 



Type-directed 
Automatic Code 
Generation 



Figure 1.1: The Layered Design of the Id Language. 



these auxiliary tasks during execution. 

This layered design presents a very flexible interface to the application writer where more 
functionality can be added to the user-level simply by adding more system-level libraries writ- 
ten in the kernel language. The type system is responsible for clearly defining and enforcing 
the abstraction between the two layers so that polymorphicL functional behavior and simple 
reasoning can be preserved at the user-level. At the same timeLthe Id run-time system is able 
to map the system-level kernel language constructs onto the underlying target architecture in 
an efficient wayLindependent of the source language used. 

1.3 Applications to Conventional, Imperative Languages 

The functional encapsulation mechanism described in this thesis is not only applicable to higher- 
orderLpolymorphic languages like Id and HaskellLbut also to conventionalLmonomorphic lan- 
guages like C and Pascal. This mechanism allows safe conversion of mutable objects into 
read-only functional objects. This transformation is useful for both sequential and parallel 



18 



versions of conventional imperative languages. We discuss some of these uses below. 

The most important property of a functional object is that its value does not change during 
the course of execution of the program. ThereforeFa functional object may be freely copied 
if necessaryror converselyT excessive copies may be freely eliminated. This property leads to 
obvious compile-time optimizations such as common sub-expression eliminationFcode-hoistingr 
and memory-fetch elimination that attempt to reduce the number of copies. This also permits 
unlimited caching of such functional data in a parallel machine without any risk of write- 
invalidation. In parallel systems using software-controlled shared-memory protocols [Nik94T 
FLR + 94] this may directly translates into cheaper protocols for object access and migration. 

While writing parallel programsr programmers often make implicit assumptions that a 
sharedr mutable object may not be updated outside a given control block or that a partic- 
ular processor may have exclusive access to a shared object without actually locking it. Such 
assumptions are usually based upon the implicit logic of the program and as such it may be quite 
difficult to prove their correctness. With a little help from the user in identifying such objectsr 
the encapsulation mechanism described in this thesis can verify such assumptions automati- 
cally. This mechanism also allows making safer unsynchronized access to such shared objects 
outside their encapsulated control-block because the objects are guaranteed to be read-only at 
that point. 

Finallyrconversion of mutable objects into functional objects also improves other compile- 
time analyses such as memory-aliasing analysis and loop-dependence analysis by clearly dis- 
ambiguating between read-only and read-write data. Thisr in turnr may benefit automatic 
parallelization of sequential programs that make use of such analyses. 

Thusr providing the ability to restrict the scope of side-effects to mutable data-structures 
translates into important optimizations at all levels of program design and implementation. 
This thesis provides the basic type-based framework for making such optimizations feasible. 

1.4 Outline of the Thesis 

1.4.1 Part I 

This thesis is divided into two parts. In Part I (Chapters 2F3 and 4)Twe describe a powerful 
type system that has the ability to encapsulate programs constructing mutable data-structures 
and view them as returning functional data-structures while guaranteeing that no more updates 
take place on the returned objects outside the encapsulation. 

Chapter 2 is an informal and intuitive condensation of the major ideas in Part I. We 
introduce the problem by means of a simple example involving functional arrays in Id. We 
briefly survey the literature comparing various existing imperative type systems and informally 
describe our solution as an extension to one of the existing type systems. ThenFwe discuss 
"language-level issues" such as type-safetyFpolymorphism and non-mutability within our type 
system and how they interact with "system-level issues" such as space and time efficiencyr 
parallelismFand memory synchronization. FinallyFwe describe specific strategies used in our 
type system that take care of these issues. 

Chapter 3 describes the formal machinery and the soundness proof of our type system that 
is the main theoretical contribution of Part I. We start with the description of a smalir im- 
perative language containing simple mutable locations and a special language construct called 
close to convert them into immutable locations. We provide the dynamic and static semantics 
of this language in terms of relational axioms and inference rules and show their useful seman- 
tic properties. ThenFwe set up a semantic model that defines a consistent relation between 



19 



values and their types. This relation maps read-write locations to mutable types and read-only 
locations to functional types. FinallyFwe prove a soundness theorem stating that the static and 
the dynamic semantics of our expression language are consistent with respect to each other. 
It follows immediately that the mutable objects that are successfully converted into functional 
objects under our type systemFare never updated again dynamically. 

Chapter 4 extends the formal machinery of Chapter 3 to complex datatypes such as arraysr 
tuplesrfunctionsrand general algebraic datatypes. We discuss how the user would syntactically 
specify the conversion of arbitraryHmperative data-structures into functional ones and how the 
compiler would automatically verify the soundness of this conversion. Finally we summarize 
the results of Part I and discuss directions for future research. 

1.4.2 Part II 

In Part II (Chapters 5r6r7 and 8)Twe study the technique of complete run-time type recon- 
struction and its various applications within the run-time system for Id. 

Chapter 5 discusses some design issues that affect the use of type information within a 
run-time system. There are "language issues" such as strong vs. weak typingFand allowing 
polymorphism and higher-order functions in the language; "compiler issues" such as using static 
vs. dynamic typingFand type inference vs. type declaration; and "run-time system issues" 
such as using tagged vs. untagged object representation modeirand using type maintenance vs. 
type reconstruction to obtain type information at run-time. We classify various programming 
languages on the basis of these issues. We also discuss our approach of complete run-time 
type reconstruction with an untagged object representation model and discuss some of its 
applications such as source debuggingrtagless garbage collection and I/O. 

Chapter 6 motivates the problem of compiler-directed polymorphic type reconstruction by 
means of examples and describes the technique informally. Firstrwe describe the logical execu- 
tion model of an Id programrdividing the work into compile-timer link-timer invocation-time 
and run-time. ThenFwe characterize the type information that needs to be recorded at compile- 
time to permit complete type reconstruction at run-time. Nextrwe informally describe how to 
analyze and translate the source program to propagate this information. FinallyFwe show the 
process of run-time type reconstruction using an example and discuss some optimizations. 

Chapter 7 formalizes the concepts of Chapter 6 using a simplified kernel language for Id. 
This language is very close to the actual intermediate form used within the Id compiler. We 
present the analysis and program translation rules to generate and propagate all the necessary 
type information at compile-time. We also present a formal algorithm for type reconstruction 
and prove its correctness. 

Finallyr Chapter 8 discusses a full scale application of type reconstructionrtagless garbage 
collection. We describe a study that compares the performance of our type-reconstruction based 
garbage collection scheme with conservative garbage collection and a compiler-directed explicit 
deallocation scheme. 

1.4.3 How to Read this Thesis 

Both Part I and Part II are self-contained and may be read independently. 

For Part irChapter 2 should be sufficient for readers that are only interested in understand- 
ing the problemrits contextrand the intuitive ideas behind the proposed solution. Readers 
interested in the mechanics of the proposed type system and its extensionsr possibly with a 
view towards implementing itrshould look at the semantic machinery described in Sections 3.1 



20 



and 3.2 of Chapter 3rthe extensions discussed in Chapter 4ras well as the type inference ma- 
chinery described in Chapter 3 of [Ler92]. Of coursertheoretical enthusiasts may want to go 
through all the detailed proofs provided in Chapter 3. 

For Part IirChapter 5 and Chapter 6 provide a general introduction to the idea of using type 
information at run-timer an intuitive description of the issues involvedrand the technique of 
complete type reconstruction and its various applications. Chapter 7 is a must for readers inter- 
ested in the detailed understanding and implementation of the type reconstruction mechanismr 
although the last section on the correctness proof of the reconstruction algorithm is mainly of 
theoretical interest. FinallyrChapter 8 provides a realistic perspective on the potential uses of 
this technique in the context of tagless garbage collectionFand its cost trade-offs. 



21 



22 



Part I 



Types in Language Design: 
Functional Encapsulation 



23 



Chapter 2 

Functional Encapsulation of 
Imperative Data-Structures 



In this chapteiTwe study the problem of providing a suitable type abstraction mechanism 
between the user-level layer and the underlying system-level layer in a programming language. 
We introduce a new language construct called close that provides a statically verifiableFsafe 
export mechanism for imperative data-structures from the system-level layer into the functional 
user-level layer. We present several examples illustrating the usefulness of this construct and to 
discuss the technical issues involved in proving its soundness. We also compare our approach 
to other systems in the literature. 

2.1 The Problem 

Let us consider the problem of implementing functional arrays in Id that are homogeneousl" 
non-mutablerpolymorphic arrays. The library function make_vector creates a one dimensional 
functional array that memoizes a computation for a given index range as shown in the following 
example: 1 

Example 2.1: 

def compute i = ... some large computation ...; 
compute_memo = make_vector compute (1,10); 

How is the function make_vector implemented? OperationallyFone has to allocate an empty 
vector and fill it with the result of applying the given function to each index position. There are 
two possibilities. We could treat make_vector as a language primitive and hard-wire it within 
the compiler. ThenFwe would have to provide a slew of such primitive functions that define 
functional vectorsFmatricesFand higher dimension arraysFalong with their common patterns 
of construction. Some languages (including Id) provide special array construction syntax called 
array comprehensions to alleviate this problem. While array comprehensions are convenientr 
they still do not cover many useful construction patterns. They also increase the complexity of 
the compiler and the language it must manipulate. Moreoverrthis solution does not apply to 
user-defined functional abstractions in addition to those already present in the language. 



All our examples use the Id language syntax [Nik91]. We will provide brief explanations as necessary. 
Function definitions in Id are introduced with the keyword def, all statements are terminated with a semi-colon 
(;) and application is by juxtaposition. 



25 



The other possibility is to provide an imperative kernel language using which make_vector 
and other array construction functions may be defined in a separate library. Special syntactic 
constructs like array comprehensions may also be desugared into this kernel language. The ker- 
nel language would support primitive operations such as simple arithmeticFallocating a vectorr 
storing/fetching a value at a particular index of a vectorFand simple control mechanisms such 
as iteration and procedure call. This is the approach taken in Id. This approach also enables 
a system programmer to implement arbitrary new abstractions without changing the language 
definition or the compiler. As an examplerthe make_vector function may be implemented in 
the array library as shown below: 2 

Example 2.2: 

def make_vector f (l,u) = 
{ a = i_vector (l,u); 

_ = { for i <- 1 to u do 
a[i] = f i }; 
in a }; 

Heren_vector is a kernel primitive that allocates an empty one dimensional I-structure 
arrayr which is filled with the result of applying the filling function to each index position. 
Under the non-strictr parallel evaluation model of Idrthe array a is returned as soon as it is 
allocated; its filling loop executes in parallel. Howeverrthis does not create any race condition 
for the array because the I-structure protocol [ANP89] supports fine-grain producer-consumer 
synchronization on every memory location: multiple readers wait at an empty location until a 
single writer fills it with the desired value. 

Neverthelessras it standsrthere are some technical problems with the above implementa- 
tion. I-structures and M-structures are imperative constructs!" i.e. Tthey can be assigned toP 
whereas functional arrays are supposed to be non-imperative. ThereforeFreturning an assignable 
I-structure from make_vector is not appropriate. Furthermorerin the Hindley/Milner type sys- 
temrimperative objects are allowed only a restricted form of polymorphism to ensure type-safety 
[Tof90]. Thusrthe functional arrays implemented using I-structures in this manner would have 
restricted polymorphismFwhich reduces the utility of such library implementations. 

Both problems described above may be solved by providing the ability to package the above 
implementation of the make_vector function into a type-safeFpolymorphicrfunctional abstrac- 
tion as required by its intended interface. In generairthe kernel language should contain a 
general type abstraction mechanism that can properly encapsulate such imperative implemen- 
tations of functional data-structures while ensuring their polymorphism and non-mutability 
outside the abstraction. 

2.1.1 Abstraction and Polymorphism 

It may appear that a conventional abstract datatype facility available in most modern languages 
should be sufficient for our purpose. Indeedrwe could write a functional array datatype that 
is internally represented using I-structures and does not allow any mutation capability in its 
abstract interface. Butrsuch an abstraction is still not completely satisfactory because it only 



All bindings within a let-block (enclosed within {}) execute in parallel. The bindings may be mutually 
recursive and their textual order is unimportant. An underscore (_) on the left-hand-side of a binding implies 
that the result of the right-hand-side expression is to be ignored. The result of the overall block is the value of 
the expression following in. 

3 Strictly speaking, I-structures are not mutable, since they have write-once semantics, but an empty I- 
structure can be filled with any value using assignment. 



26 



hides the internal data representation of the functional datatyperit does not automatically 
restore the full polymorphism of the functional datatype from the restricted polymorphism of 
its imperative implementation. This polymorphic strengthening of the datatype has to be done 
explicitly and under additional semantic analysis that guarantees its soundness. As we will see 
in Section 2.2rthe treatment of type polymorphism is significantly more complicated by the 
presence of imperative constructs under the usual call-by-value semantics. 

A radically different but equally interesting approach is to define a call-by-name semantics 
for the polymorphic objects within the kernel language and permit unrestricted polymorphism 
for imperative programs. In this caserthe conventional data abstraction mechanism would be 
sufficient to hide the imperative implementation of a functional datatype. In this alternate 
semanticsr called polymorphism-by-name [Ler93]Tthe evaluation of a polymorphic object is 
suspended and each type instantiation re-evaluates the suspension in the current context to 
produce a fresh instance. In contrastrthe usual ML-like polymorphism is called polymorphism- 
by-value where polymorphic objects are evaluated only once and the resulting value is shared 
among all its instances. 

Leroy showed in [Ler93] that the naive Hindley/Milner typing rules are sound with respect to 
polymorphism-by-name semantics for imperative references and continuations. This approach 
is used in languages like Quest [Car89] and to a limited extent in CLU [LAB+81] where explicit 
type parameters are used to abstract and instantiate polymorphic objects. Unfortunatelyl" 
suspension and re-evaluation of polymorphic objects destroys their sharing characteristics which 
are very important in the dynamic semantics of the Id language. ThereforeFwe would like to 
improve upon the abstraction characteristics of the polymorphism-by-value type systems that 
preserve such sharing. 

In another easel" Wright experimented with the type system of Standard ML by restrict- 
ing polymorphism to only certain classes of syntactically recognizable values such as function 
declarations!" constants!" and known functional constructors [Wri93]. These functional values 
can be recognized statically and therefore can be generalized and shared safely. Mutable data- 
structures are always classified as dynamic entities and therefore can never be generalized. He 
showed empirical evidence that this restricted form of polymorphism is sufficient for a large 
class of existing Standard ML programs. UnfortunatelyFour ultimate goal is to provide func- 
tional and polymorphic view of dynamically created imperative data-structures for which this 
approach is entirely inadequate. 

2.1.2 Outline of our Approach 

We can divide our problem into two distinct phases. FirstFit is important to be able to give 
sound and accurate type semantics to imperative constructs in the kernel language. We must 
precisely capture the imperative types of mutable objects and propagate them with first class 
status while handling higher-order functionsr storing into data-structures and passing them 
around as arguments and results. 

The second phase involves presenting a functional view of the mutable objects to the end 
user. This may involve a semantic check on the part of the compiler (or the system programmer) 
as well as some sort of type conversion to convert the imperative types into functional types. The 
type system must ensure that fully functional and polymorphic behavior is projected through 
the abstraction both in static and dynamic semantics. We present a new language construct 
called close that achieves this functionality through the type system. 

The interaction of polymorphism and imperative programming has been the subject of 
active research in the past decade [Dam85rTof90rAM89rLW9irLer92rTJ92rWri92]. Several 



27 



type systems have been proposed in the literature spanning a wide range of expressiveness and 
complexity. We present a brief survey in Section 2.2. Since our main task is to provide an 
encapsulation mechanism for imperative program fragmentsFwe prefer to extend an existing 
imperative type system that meets our needs rather than design a new one. We have chosen 
the Closure Typing system proposed by Xavier Leroy in this thesis [Ler92] as a convenient 
starting point for our encapsulation extensions. We motivate this choice in Section 2.2.7. 
In Section 2.3rwe informally describe the meaning and the use of the close construct via 
examples and discuss issues of type-safetyrnon-mutabilityrefficiencyrparallelismrand memory 
synchronization in their context. Finally in Section 2.4rwe present informal typing strategies 
that ensure the soundness of the close construct. 

2.2 Imperative Type Systems 

It is well known that the simple Hindley/Milner type system yields unsound typing when applied 
to mutable data-structures in the naive way. In this sectionFwe briefly review this problem and 
describe some practical extensions to the type system that handle it to some extent. 

2.2.1 Simple Hindley/Milner Type Inference 

Consider the following example 4 in Id that emulates the ref construct of MF using a naiver 
functional Hindley/Milner type system: 

Example 2.3: 

type ref to = mkref !£q» '/« mkref :: Vio-io — > ( re f to) 

r = mkref identity; '/, r :: Vii.(re/ (t\ — > t\)) 

r! ! mkref _1 = square; 

_ = r! ! mkref _1 true; '/, Dynamic Type Error! 

The datatype ref defines a polymorphic constructor mkref that allocates a mutable cell and 
initializes it with a given value. The value contained in the cell can subsequently be updated 
by field assignment as shown above. 

The type schemes 5 for the constructor mkref and the mutable cell r inferred by the naive 
type system are shown on the right. Note that the mutable cell r is given a polymorphic type 
which can be instantiated to int — > int or bool — > bool as desired. Thusrthis example passes 
the type system even though it causes a run-time type-errorr attempting to apply an integer 
function square to a boolean true. The problem is that the type of a mutable object should not 
be deemed polymorphic even if it initially contains a polymorphic value. This is because later 
such objects may be updated to contain values that do not possess the expected polymorphism. 
The type system must be aware of such mutable objects and keep track of their types in a 
sound manner. 

One way to avoid such unsound polymorphism is to statically approximate the state of the 
mutable store and the set of objects stored within it. These (presumably) mutable objects are 



User-defined types and constructors in Id are introduced with the keyword type. A (!) in front of a con- 
structor field denotes that it is mutable, and can be used in M-take/ M-put (!) or examine/ replace ('.'.) operations 
during field access. Fields are accessed by position using numeric suffixes (starting from 1) to the constructor 
name. Although Id has parallel semantics, our examples assume a sequential order of evaluation for simplicity. 

A type-scheme is a polymorphic Hindley/Milner type containing type variables, such as to,ti,..., some of 
which may be bound by the universal quantifier (V). Bound type variables may be substituted for different types 
in different contexts giving rise to polymorphic type-instances. 



28 



not allowed to have polymorphic types. The mutable store approximation needs to be updated 
whenever there is a possibility of allocating a new mutable object or updating an existing 
mutable object. This has to be achieved in a flexible but sound manner within and across 
function and local block boundaries. 

Many type systems in the literature follow this general framework [Dam85rTof90rLG88r 
JG9irWri92rTJ92]. The various systems differ in their notion of a store abstraction and the 
amount of information propagated across function boundaries. An illustrative comparison of 
some of these systems is presented in [OJ91]. Firstrwe will briefly describe two such systems 
that are simple extensions of the original Hindley/Milner type system and have been successfully 
used in practical programming languages. ThenFwe will describe two more recent type systems 
that are more complex but are much more powerful in dealing with higher-order functions. 

2.2.2 Type System of Standard ML 

In Standard ML [MT9irMTH90]rtype variables are syntactically classified into two separate 
categories: imperative type variables (uo, u\, . . .) that may occur in the type of a mutable ob- 
ject at some stage of type inference and therefore implicitly model the abstract mutable storer 
and applicative type variables (to, t\, . . .) that can never occur in the type of a mutable object. 
Furthermorersince the evaluation of variables and A-expressions (termed as non-expansive ex- 
pressions) never generates any new mutable objectsr imperative type variables occurring in 
their types are allowed to be generalized. 6 WhereasFapplications and let-expressions (termed 
as expansive expressions) may allocate new mutable objects on evaluationrtherefore imperative 
type variables occurring in their types are not allowed to be generalized. The resulting type 
system is sound [Tof90]Teasy to implementrand correctly rejects Example 2.3 as a type-error. 
To see thisFnote that under this schemeFstorage allocating functions such as mkref always con- 
tain imperative type variables in their type-schemes because they allocate and return mutable 
memory locations. Thusrthe type of r cannot be generalized since it will contain an imperative 
type variable that is created in an expansive expression (application). 

One of the problems with this system is that the modeling of imperative objects is too 
simplistic. The imperative type variables model values contained in a mutable location rather 
than the locations themselves. This has the effect of "contaminating" the types of the values 
fetched out of mutable locations. Consider the following example: 

Example 2.4: 
def identity x = x; '/, identity :: Vio-io — > ^o 

def identity' x = (mkref x) ! Imkref _1; °/, identity' :: Muq-Uq — > uq 

nil' = identity' nil; 

x = l:nil'; 

y = true: nil'; '/, Static Type Error! 

Although identity' is assigned a polymorphic typerit is still weaker than the type of the 
identity function. This is because the identity' function temporarily stores its argument 
within an imperative location. This contaminates the type of the returned result to be impera- 
tive and unnecessarily restricts its polymorphism as shown. We would have liked to assign the 
same type to both identity functions. 

Another problem with this system is that the distinction of expansive and non-expansive 



Type generalization refers to the process by which some type variables occurring in a type are bound with a 
universal quantifier (V) converting that type into a polymorphic type scheme. 



29 



expressions is also very simplistic. In particularrthis system cannot deal with higher-order or 
partially applied imperative functions. Consider the following example: 

Example 2.5: 

mkref' = identity mkref ; °/, mkref' :: uq — > (ref uo) 

x = mkref' 1 ; 

y = mkref' true; '/, Static Type Error! 

The application of mkref to the identity function strips out its polymorphism because 
the type system deems this application as expansive whether or not any mutable reference was 
allocated within the identity function. This causes the unnecessary type-error to be flagged 
by the type system. The following example illustrates a similar problem for partial applications: 

Example 2.6: 

def imp_map f 1 = °/, imp_map :: VwoWi.(wo —> u \) —> {list u o) —> {list u\) 

{ arg = mkref 1; °/, arg :: {ref {list uo)) 

res = mkref nil; °/, res :: {ref {list ui)) 

in 
{while not (nil? arg! ! mkref _1) do 
x : xs = arg ! mkref _1 ; 
arg! mkref _1 = xs; 
res! mkref _1 = f x : res! mkref _1; 
finally reverse res ! mkref _l}} ; 

def fn_map f nil = nil °/, fn_map ::\/toti.{to —t ti) — > {list to) — > (list ti) 

I . .fn_map f (x:xs) = f x : fn_map f xs; 

list_identity = impjnap identity; °/, list_identity :: {list u^) — > {list u^) 

u = list_identity (l:2:nil); 

v = list_identity (true:f alsernil) ; °/, Static Type Error! 

Just like identity' function in Example 2.4Tthe type-scheme assigned to the function 
impjnap in the above example contains imperative type variables because it uses mutable lo- 
cations internallyT while its functional version fn_map carries only applicative variables. Fur- 
thermorerwhen using imp_mapFalthough no actual allocations take place until after its second 
argumentrthe type system has no way to determine this and it deems the first application to 
be expansive as well. This results in a non-polymorphic type for the list_identity function 
as shown. This problem of typing partial applications was fixed in part by the type system of 
Standard ML of New Jerseyr which we discuss next. 

2.2.3 Type System of Standard ML of New Jersey 

The type system of Standard ML of New Jersey [AM89] assigns an integer rank to each im- 
perative type variable. We write these ranks as superscripts on the type variables. A rank 
imperative type variable u° occurring within the type of an expression at the top-level indicates 
that the type of some existing mutable object already contains u° and therefore u° should not 
be generalized. Such a type variable is said to have entered the mutable store typing. A pos- 
itive r&nk-d (d > 0) imperative type variable u occurring within a function type denotes the 
number of application after which u d will enter the store typing. Therefore u d is allowed to be 
generalized for up to d — 1 partial applications involving the function type where each partial 
application reduces its rank by one. This scheme is extended to typing objects enclosed within 



30 



A-abstractions by keeping track of the number of application necessary to make them enter the 
store typing. The resulting type system is slightly more complex than Standard ML but still 
relatively easy to implement and has been recently shown to be sound [HMV93]. 

Without going into detailsrit should be clear that this modification handles the function 
list_identity in Example 2.6 quite well. The type of the function imp_map is now inferred to 
be Muqu\.(uq —t- u\) —7- (list Uq) — ?- (list M^Twhere the superscript 2 denotes that the actual 
allocation of imperative objects in the function's body does not take place until after the second 
application. Thereforerthe type of list_identity is inferred to be Mu\.(list u\) — ?- (list u\)Y 
where the superscript 1 denotes the fact that one more application of this function will create 
some fresh mutable memory locations. 

Unfortunatelyrthe simple ranking mechanism outlined above is still not sufficient to deal 
with imperative higher-order applications as shown in Example 2.5. The type system does 
not have any way to characterize when and how to incorporate "potentially" imperative type 
information from arguments of higher-order functions within their final result. Thereforerthe 
type system must conservatively assume that all imperative functions generate mutable objects 
when passed as arguments to higher-order functions. The following comparison illustrates this 
point: 

Example 2.7: 
def ap_nil f = f nil; °/, ap_nil ::\/toti.((Ust to) — > t\) — > t\ 

foo = ap_nil mkref ; % foo :: (ref (list u®)) 

mkref' = identity mkref; °/, mkref' :: m° — > (ref u®) 

Hererthe imperative function mkref is passed as an argument to two polymorphic functions 
ap_nil and identity. In the first case (identifier foo)Tthe type of the application is correctly 
inferred to be non-polymorphic because it actually creates a fresh mutable reference. But in 
the second case (identifier mkref') Tthe type of the application is unnecessarily non-polymorphic 
because the mkref function is never applied within the body of the identity function. The 
type of the identifier mkref' should in this case be Mu\.u\ — > (ref M^Twhich is identical to the 
type of the constructor mkref in this system. The problem is that the type system has no way 
of knowing that the function ap_nil applies its parameter f to one argument and therefore 
may potentially create mutable referencesFwhile identity passes its parameter unchanged and 
therefore cannot create any mutable references. Hencerthe type system must conservatively 
assume that all imperative functions create mutable references when passed as arguments. 

The formalization for the above type system presented in [HMV93] is somewhat more pow- 
erful than the SML/NJ compiler implementation and it can deal with the above situation 
correctly. Althoughnt requires a more complicated mechanism for rank book-keeping and uses 
rank variables instead of fixed integral ranks. It also entails a more complicated type unification 
mechanism that needs to resolve algebraic constraints on rank variables. The interested reader 
is referred to [HMV93]. 

2.2.4 Limitations of the Standard ML Type Systems 

Although the two type systems presented above cover a lot of practically useful cases of im- 
perative programming! 1 they are still not sufficiently powerful for our purposes. Ultimatelyr 
we intend to smoothly convert mutable types into functional typesFso our type system must 
not only propagate the mutable type information properly where necessaryl" but also keep it 
self-contained and easy to manipulate. The problem of type variable contamination as shown 
in Examples 2.4 and 2.6 is a serious one in this regard. None of the systems presented above 



31 



have the ability to assign the same polymorphic type to functions identity and identity' or 
functions fn_map and imp_map. At some observational leveirthese functions are equivalent but 
the internal implementation of the imperative versions shows up in their type and hence they 
are not interchangeable with respect to these type systems. 

Another fundamental problem with the above systems is that they concentrate on modeling 
"imperativeness" of objects only to the extent it affects their type polymorphism. For instancer 
there is no difference between the type of a record that contains a functional integer field and the 
one that contains a mutable integer field. Since we are ultimately interested in approximating 
dynamic mutability of all objects by means of their static types (whether polymorphic or not)T 
the partial modeling offered by the type systems above is also unsatisfactory for our purposes. 

Both observations above show that tying the "imperativeness" of mutable objects to the 
kind of type variables contained in their types is rather simplistic and imprecise. We will now 
look at some type systems where this information is tracked independentlyr leading to a much 
more complete and cleaner characterization of imperative objects. 

2.2.5 Effect Systems 

Effect systems are a broad class of polymorphic typing systems that use static type-checking and 
inference techniques to model the dynamic behavior of programs written in imperative languages 
[Luc87rLG88rTJ92rWri92]. Originallyr such systems were used to collect and propagate 
side-effect information across program fragments for compiler optimizations and parallelization 
[LG88]. One such type and effect system was successfully used in the FX-87 language [GJLS87] 
which supported explicitly declared type polymorphism. 

More recentlyr automatic type and effect inference techniques have been developed [TJ92r 
Wri92] that use the effect propagation mechanism to infer types that model polymorphic imper- 
ative objects more accurately than the systems given above. As we will see shortlyFsuch type 
and effect systems can be viewed as a logical extension of the type systems described above. 

Effect Analysis 

Probably the most appealing aspect of effect systems is their uniform and integrated mechanism 
of type and effect information propagation across all function and local block boundaries. The 
key idea is that every expression generates a read/write/allocate effect which is accumulated 
along with its type. The effect of the body of a functionrparameterized by the effect of its formal 
argumentsris summarized as the latent effect of the function on the arrow type-constructor 
(—?■) in its type. Functions by themselves have no immediate effect. Unknown latent effects for 
functional parameters of higher-order functions are modeled using effect-variables. This effect 
parameterization permits a clean way of computing the overall effect of a function application 
by instantiating its latent effect by the effects of its actual arguments. The effect information 
propagated and accumulated in this manner may then be used to accurately identify the creation 
of polymorphic imperative objects and avoid their unsafe generalization. 

In one of the simpler effect systems proposed by Wright [Wri92]Tall type variables present 
in the type of a freshly allocated mutable data-structure are collected as part of the effect of 
that allocation. The explicit effect computation and propagation mechanism obviates the need 
to mark such type variables as imperative. Unsound typings are then avoided by disallowing 
generalization of type variables that occur in the immediate effect of an expression. This system 
still does not deal with the issues of imperativeness and type polymorphism independentlyrbut 
at least the information flow across higher-order function boundaries is improved because of 



32 



the effect propagation techniques. 

As an examplerconsider the function f n_map shown below: 

Example 2.8: 

fn_map :: Wo^ijo-f^o — > t\) — > (list to) > (list t\) 

def fn_map f nil = nil 
I . .fn_map f (x:xs) = f x : fn_map f xs; 

mkref :: vto.to — > (ref to) 

ref _list :: (list (ref (list £2))) with immediate effect {alloc(list £2)} 
ref_list = fn_map mkref (nil mil); 

/o,/i, . . . are effect-variables which may be substituted for any effect. (f> denotes the null 
effect. Effect-variables are allowed to be generalized and instantiated just like type variables. 
The type of the function f n_map illustrates the use of these effect variables. The latent effect 
of the mapped function is captured in the effect variable /o that is exposed in the final effect 
of the f n_map function. 

The example also shows the type of the reference allocator function mkref. The latent effect 
appearing over the arrow (— >) shows that the function allocates a mutable object of type to. 
As shown in the examplerthe effect of mapping mkref to a polymorphic list instantiates and 
exposes its latent effect of allocating mutable cells containing polymorphic objects. Since ti is 
present in the immediate effect of the expression creating ref _listnt cannot be generalized. 

This system infers the type-scheme Vio^i/o-(^o — > t\) — > (list to) - — > (list t\) for the 
function imp_map of Example 2.6 (c.f. fn_map of Example 2.8). Note that the first application 
has no effectrand the second application records the effect of allocating new local memory 
references for internal identifiers arg and res (as a set of type variables to, t\ occurring in those 
reference types) as well as the effect of applying the argument function f (captured via the 
effect- variable /o). Thusrin this systemr partial curried applications do not expose the final 
effects prematurelyr but the problem of type contamination by unnecessarily exposing local 
effects still exists. 

Principal Types and Minimal Effects 

In order to compute the type and the effect of every expression automatically and efficientlyr 
one must show that the system admits unique principal types and effects for expressions and 
that they are computable using an efficient inference algorithm. At least two effect-based sys- 
tems [TJ92rWri92] propose such inference mechanisms based on structural unification [Rob65]. 
The effect system of FX-91 [GJS091] uses the more complex algebraic unification [JG91] which 
permits unification modulo algebraic identities such as associativity and commutativity. This 
provides more expressive power to the inference systemFalbeit at the cost of simplicity and effi- 
ciency. HerePwe will only discuss the inference system based on standard structural unification. 
The basic idea is to compute the principal types of expressions in the usual way using the 
standard Hindley/Milner type inference mechanism while accumulating a set of constraints 
for the latent effect of all the function types in the program. Thenrthis constraint set is 
solved separately to obtain the minimal effect of each function in the program. This process 
is not completely straightforward because of the possibility of cyclic constraints created due to 
mutually recursive functions. The following examples illustrate this problem 7 : 



The latent effects of functions are represented in this system by a constrained effect- variable. The constraints 



33 



Example 2.9: 

def fO x = fl x; '/. fO :: t -^ t t with {/ 3 /i} 

def fl x = fO x; '/. fl :: t -^ h with {f t 3 f } 

£ 

def g x = { a = mkref x; '/, g :: t -A (list (ref t )) with {/ 3 ({to} ^ fo)} 

in a:(g x) }; 

def h x = { a = mkref h; °/, h :: int — > int with {/o 3 {int — > int}} 

in x+1 }; 

Minimal effects in the above cases are computed by combining the effects of all cyclic 
constraints into one and finding the least assignment to effect-variables (starting from the null 
effect (f>) that would satisfy all the constraint inequations. Thusrfunctions f and f 1 in the above 
example are each assigned the null effect (f> and the function g gets the effect {^o}- The function 
h represents an interesting case. Depending on the desired semantic interpretation of effectsr 
the least effect satisfying this constraint may be taken to be infinite and such expressions may 
be classified as ill-formed (system [TJ92])Tor this constraint may be simplified to {/o 3 {Jo}} 
which yields the null effect (f> as the minimal solution (system [Wri92]). 

Region Analysis and Effect Masking 

Some effect systems also carry out a region analysis of memory allocation and sharing [LG88r 
TJ92]. The static description of an expression also summarizes a conservative approximation of 
the memory regions (locations) manipulated within the expressionrin addition to its type and 
effects. If a set of regions is found to be purely local to an expressionri.e.rif these regions are not 
accessible through a free variable of the expressionFand if they are not exported via the result 
of the expressionrthen the effects associated with those regions may be erased from the overall 
effect of that expression. The idea is that only certain "observable" effects on "visible" regions 
need to be keptrthe rest may be safely erased without affecting the semantics of the program. 
This is known as effect masking. This analysis may be able to mask all the side-effects to internal 
data-structures of a procedure which largely alleviates the problem of type contamination. In 
this senserthis scheme is capable of automatically assigning purely functional types for some 
classes of imperative programs. For examplerthese stronger systems are able to infer the same 

type for imp_map and fn_map (namelyrV£o£i/o-(£o — > t\) — > (list to) — > (list t\)) since the 
mutable references created within imp_map can be masked. 

Region analysis requires a lot of book-keeping to maintain a very fine static notion of the 
mutable store. The benefit of obtaining this additional region information and performing 
effect masking has to be weighed against the extra complexity required to do these analyses in 
a practical language implementation 8 . FurthermoreFeffect masking does not cover all cases of 
effect erasure that we are interested in. For examplerthe effect generated by the make_vector 
function of Example 2.2 can not be masked since the mutable vector is being returned as the 
result of the function. The user can still update this vector and destroy the type polymorphism 
that might result by erroneously masking this effect. 



are of the form (effect-variable Zl effect) which means that the effect on the right hand side is a lower bound on 
the actual effect denoted by the effect-variable on the left hand side. 

Indeed, region analysis was dropped from FX-91 language [GJS091] which is a more recent version of FX-87 
[GJLS87]. 



34 



Analysis of Effect Systems 

On the wholer effect systems seem to be a powerful tool to summarize a variety of dynamic 
behaviors of programs accurately. But we still have to extend the effect masking analysis to meet 
our original goair which is to transparently encapsulate imperative programs into functional 
abstractions. We also anticipate that external factors such as a user-declared functional interface 
will play an important role in guaranteeing type-soundness in our system in spite of otherwise 
non-maskable imperative effects in the program. None of the existing effect systems incorporate 
such information. In Section 2.4rwe will explore some of these ideas where a powerful type 
system is combined with user-supplied information while trading off some of its power for speed 
and simplicity. 

2.2.6 Syntactic Closure Typing System 

All the type systems we have seen so farFmodel the state of the dynamic mutable store and the 
operations performed on it using some static approximationFand then use that information to 
identify the objects that can safely be assigned polymorphic types. Instead of approximating the 
dynamic behavior of the programrLeroy and Weis [LW91] introduced a more directrsyntactic 
way of identifying and safely typing mutable objects using an extension of the Hindley/Milner 
type system. We discuss their technique below. 

Syntactic Analysis 

The key idea in the scheme proposed by Leroy and Weis is that the static type of a complex 
object can be used as a clue to its structural shape and dynamic properties (such as mutability) 
of its various components. For exampleFan Lvector type represents an assignableri-structure 
arrayFwhile a vector type represents a functional array. This is exactly the information required 
to decide what parts of that object's type can be safely generalized. Note that this information 
is independent of when/where/how the object was created in the program and depends only 
on its static type structure. This analysis relies on the assumption that the type of an object 
remains visible from all places within the program where that object may actually end up. Then 
the generalization scheme is simply that the type variables present within an assignable portion 
of a type (such as to within the type (Lvector to)I(Ust t\)) are considered to be dangerous and 
are not generalizedrwhile all other type variables occurring elsewhere in the type (such as t\) 
are allowed to be generalized. 

The key assumption in the above scheme is that all objects can be viewed as data-structures 
whose component types are reflected back in the type of the overall object. In particularFobjects 
captured inside the environment part of a function closure must also be made visible in the type 
of that function. Otherwise the mutability information of a datatype could be lost by placing 
it within a function closure. The following example illustrates this point (c.f. Example 2.3): 

Example 2.10: 

def fnref x = °/, fnref :: Vio-io —> (void — > to, to — > void) 

{ r = mkref x; 

def read () = r!!mkref_l; 
def write y = { r! ! mkref _1 = y }; 
in 
read, write }; 

reader, writer = fnref identity; 



35 



_ = writer square; 

_ = reader () true; °/, Static Type Error! 

The function f nref emulates the functionality of the mutable constructor mkref of Exam- 
ple 2.3 by creating separate read and write handles to a shared mutable reference r. In the 
scheme proposed by Leroy and Weisrthe function type-constructors (— >) of read and write 
functions are augmented with closure types that expose the types of objects captured within 
their closure environments. Without closure typesrit is impossible to tell from the types of 
the read and write functions that they share a mutable reference. Thusrdosure types help in 
identifying hidden dangerous type variables and therefore avoid their unsound type generaliza- 
tions. In the above exampleFwhen the function f nref is used to create the reader and writer 
handles to a hidden mutable reference to identityr their non-empty closure types correctly 
restrict their types to be non-polymorphic and the type-error can be detected. 

In generairdosure types for a function must keep track of the type of all the free variables 
occurring within the function bodyFwhether such types are dangerous or not. This is because 
such free variables may correspond to formal parameters of an enclosing function that may ul- 
timately be instantiated with a mutable object at some application site. The following example 
illustrates this point: 

Example 2.11: 

def K x = {fun y = x}; '/. K :: Vi *i- *o -> {h -^ *o) 

f = K (mkref identity) ; '/. f :: Vii. h re/ ^* 2) re f (t 2 -» t 2 ) 

The type of function f correctly generalizes t\ and not ti because ti occurs under a mutable 
type in its closure. This was possible only because we correctly kept track of the type to of the 
free variable x in the closure type of the body of the function K. 

Type Soundness and Type Inference Mechanism 

Leroy developed this idea in his thesis [Ler92] showing the soundness of a type system with 
closure types with respect to the dynamic operational semantics of a ML-like language with 
higher-order functions. He also showed a type inference algorithm based on this type system 
that is sound and infers principal types and closure types. 

The type inference for closure types turns out to be very similar to effect inference. A new 
class of variables called closure extension variables model the unknown closure types of higher- 
order functions just like effect- variables. There is some flexibility in deciding what closure type 
information really needs to be kept and what can be discarded. For examplerit is possible to 
keep only dangerous and certain visible type variables within a closure type of a function instead 
of recording the full types of all its free variables. The algebra of Hindley/Milner types also 
has to be extended to incorporate extensible sets of closure typesrincluding ways to compute 
dangerous type variables of a type and performing type substitution within closure types. The 
type inference mechanism then computes the usual Hindley/Milner types for all objects while 
accumulating a set of closure types for every function using simple structural unification. The 
interested readers may refer to [Ler92] for details. 

Analysis of the Closure Typing System 

Leroy's syntactic system also succeeds in giving the same polymorphic type to imp_map and 
fn_map functions just like the effect-based systems with effect masking. In his thesis [Ler92]T 



36 



Leroy makes some interesting comparisons of the expressive power of the various systems we 
have seen so far. His system turns out to be incomparable to the effect-based systems in 
power. This is not too surprising because his approach is semantically very different from the 
effect-based systems. 

2.2.7 Choosing an Imperative Type System 

As mentioned earlierFwe have chosen the closure typing system of Leroy as a starting point 
for the typing extensions proposed in this thesis. In this sectionFwe attempt to motivate this 
choice in the context of the various type systems we have seen above. 

The real choice is between an effect-based system (Section 2.2.5) or the closure typing system 
(Section 2.2.6). Type systems of Sections 2.2.2 and 2.2.3 and their extensions are either too 
simplistic in that they do not deal with higher-order functions properly or they suffer from the 
problem of type contamination. 

A requirement imposed by our ultimate goal to selectively convert some imperative objects 
into functional ones is that we should be able to uniquely label groups of imperative objectsr 
recognize them independent of other objectsFand track their movement within the program. 
Some sort of region-based analysis is necessary for this purpose. Either an effect-based system 
with regions may be usedror we may have to extend the closure typing system with regions. 

The contrast between the closure typing and the effect-based approaches may be understood 
by examining the way in which imperative type information is collected and propagated. In the 
closure typing systemrthe type of an object directly describes its imperative or functional com- 
position. This is purely staticriocally determinableFobject-based information. This property is 
extended even to functions where closure types are added to function types in order to describe 
the data captured within the closure environment. This is very appealing because at any given 
momentrall the relevant information about an object is available from its type wherever that 
object (and hence its type) travels. We say that this approach is data-driven since it keeps the 
relevant properties directly attached to the types of the data objects. 

On the other handrin an effect-based system the object themselves are not classified as 
imperative or functional. We collect the operations performed on various kinds of objects in 
a separate effect. Such effects are carried over object manipulators (functions) as their latent 
effect. At any given momentrthe properties of an object can be ascertained indirectly by 
examining the kind of effects it is participating in. Such a system is very good in summarizing 
dynamic properties of program fragments rather than describing the data itself. We say that 
this approach is control-driven since it keeps the relevant properties attached to types of control 
objects (functions). 

For our purposerthe data-driven approach is more direct and naturairsince we are interested 
in determining and manipulating the imperative or functional nature of data objects directly. 
We need not separately keep track of the dynamic properties of the functions manipulating 
these objects. A soundrfunctional abstraction of an object can be built simply by changing 
the type of the object regardless of the way it is computed. Additional user information about 
an object should also be easy to incorporate into this system as long as we can show that such 
information preserves the type-safety of the static semantics. Due to these reasons we have 
chosen this type system as the basis of our extensions for converting imperative objects into 
functional onesFwhich we refer to as "closing" the imperative objects. 



37 



2.3 Closing Imperative Data-Structures 

In this section we will informally describe what we mean by "closing" an imperative object and 
discuss several technical issues arising out of it. 

2.3.1 A Proposal for "Close" 

We observed in Section 2.1 that the returned array from Example 2.2 is mutable and must be 
assigned a restricted form of polymorphism. This restriction is necessary to achieve the desired 
type-safety in the following example: 

Example 2.12: 

def fill n = identity; 

a = make_vector fill (l,u); °/, a :: (i_vector (to — > to)) 

a[i] = square; 7, a :: (Lvector (int — > int)) 

- = a[j] true; '/, Static Type Error! 

The Hindley/Milner type of the returned array is shown on the right where the type variable 
to occurs free and is not generalized. The assignment "a[i] = square;" refines the type of 
the array a as shown which correctly generates a type-error on encountering the subsequent 
application to true. This is necessary because the indices i and j may be turn out to be 
the same at run-timerin which case this application would lead to a run-time type-error. All 
imperative type systems in the literature [Dam85rTof90rAM89rLW9irLer92rTJ92rWri92] 
catch this type-error at compile-time by restricting the polymorphism of imperative objects in 
one way or another. 

"Close" as a Type Converter 

Although the above behavior for make_vector is correctrultimatelyrwe want it to behave like 
a functional array constructor that returns a non-mutableFpolymorphic array. The interesting 
observation is that if we convert the type of the returned array from make_vector to be the 
functional type constructor vectorTthen all mutation operations on it are automatically made 
illegal since it must now be viewed as a functional object. In this caseFwe would have flagged 
a semantic error at the assignment "a[i] = square;". Since no more mutations are allowed 
on the array afwe may be able to safely generalize its type with respect to to. Henceforthr 
we will call this type conversion and subsequent type generalization operation as "closing" an 
imperative object. 

We can rewrite the make_vector implementation to reflect the above strategy: 

Example 2.13: 

make_vector :: \/to.(int — > to) — > (int, int) — > (vector to) 
def make_vector f (l,u) = 
close { a = i_vector (l,u); 

_ = { for i <- 1 to u do 
a[i] = f i }; 
in a }; 

The close construct in this implementation is intended to be a special form that captures 
our notion of closing an imperative object. It provides an alternate "functional view" for the 
imperative object. Users may use this construct in their programs to convert an imperative 



38 



data-structure (like the array a above) into a functional one. OrFsuch conversions may be 
issued automatically by a compiler while desugaring high-level functional constructs into low- 
level imperative program fragments. In either caser the type of the object being closed is 
converted from a mutable to a non-mutable type constructor that permits its subsequent type 
generalization. 

"Close" as an Encapsulator 

An important point to observe in Example 2.13 is that the close construct encapsulates the 
entire computation that allocatesrfillsFand returns the array a rather than acting merely as 
a marker for the array to be closed. Treating the close construct as an encapsulator clearly 
identifies the "scope" of the imperative operations being performed on the array a. Within this 
scoper imperative operations on the array are permittedr while outside this scoperthe array 
is viewed functionally. This notation is useful both to the userrby providing a clear visual 
separation between the imperative and functional parts of the programFand to the compilerr 
that may need to compile these parts differently as well as verify the correctness of the close 
operation automatically. This implies that the following two expressions are not equivalent: 

close exp ^ { x = exp; in close x } 

Hererexp stands for an imperative program fragment that allocates and prepares an imperative 
object for closing. The close construct on the left-hand-side behaves like an encapsulator: it 
encapsulates the entire program fragment that builds the object imperatively and then returns 
it with a functional view. There is a clear separation between the imperative and the functional 
views of the object. Whilerthe close construct on the right-hand-side identifies the object to 
be closed but it does not clearly identify the program region where the close operation should 
take effect. Thusnt becomes difficult for the type system to verify the correctness of the close 
operation. The importance of this distinction will become clear shortly. 

As a matter of notationFwhen only some of the objects being returned from an expression 
are to be closedrwe specify it in a type annotation for the entire expressionFwhere some of the 
components are only partially supplied: 

Example 2.14: 

close { a = i_vector (l,n); 
b = i_vector (l,n); 

in a, b } :: (vector _),_; 

The underscore (_) within the annotation implies that the close operation does not apply to 
that particular component of the result. All other components of the result are closed according 
to the type specified. Thusrin the above examplerthe array a is closed into a functional vector 
while the array b remains open. The contents of the array a also remain unaffected. The exact 
details of this specification appear in Chapter 4. 

2.3.2 Guaranteeing Type-Safety 

The problem of closing imperative objects is not simply a matter of type conversion as it might 
appear from the above discussion. Note that the closing operation is type-safe only if the object 
does not escape the scope of the imperative implementation in any other way except via some 
controlledrsafe paths. We saw above that if the only access to a polymorphic imperative object 



39 



is through the returned resultrthen a type conversion allows us to do a type-safe generalization 
later on. But there are several other ways in which an object might escape a given scoper 
some of which are shown in the following example. Note that using the close construct as an 
encapsulator helps in identifying the escaping objects clearly: 

Example 2.15: 

def escape_l n = 

close { a = i_vector (l,n); 

b[l] = a; '/, Storing into an external data structure 

in a }; 

def escape_2 n = 

close { a = i_vector (l,n); 

in a, a } :: (vector _),_; °/, Returning unconverted object directly 

def escape_3 n = 

close { a = i_vector (l,n); 

def g i v = '/, Returning a write handle within a closure 

{ a[i] = v; in v }; 
in a, g } :: (vector _),_; 

In function escape_lTa reference to the locally allocated imperative array a is stored into an 
external array b. The type of the array b is constrained to be (i_vector (Lvector to)) implying 
that the array a is still accessible in its open form through this indirection. In function escape_2r 
two references to the same array are returned: one is closed and the other is left open according 
to the specified annotation pattern. Mutations via the open reference will affect the type-safety 
of the closed version. The same effect is achieved in the function escape_3r although it is 
disguised in the form of a function that provides a write handle to the array being closed. 

The essential problem in the above examples is that it is safe to close a polymorphic mutable 
data-structure only if it is guaranteed that no write handle pointing to that object remains 
accessible to the user after it has been closed. Otherwiserthe subsequent functional behavior 
implied by the close operation and its possible type generalization will both be unsound. 

All the imperative type systems in the literature automatically take care of such cases by 
avoiding generalization of imperative objects at all times. The trouble arises when we wish 
to force the type system to accept a functionair polymorphic type for an imperative object as 
implied by the close construct in the above examples. Thenl" either the user must be held 
responsible for the type-safety of the resulting programFor it becomes the responsibility of the 
type system (the compiler) to automatically verify the soundness of this transformation and 
reject the unsafe cases. 

2.3.3 Guaranteeing Non-Mutability 

Note that type-safety is an issue only for polymorphic imperative objectsn.e.rimperative objects 
that have some potentially generalizable type-variables in their type. This is because the usual 
typing rules would ensure that all values assigned to monomorphic mutable objects would have 
compatible types. For exampleFall the functions in Example 2.15 would be type-safe if assumed 
monomorphic even if the array being returned was subsequently mutated. 



40 



HoweveiTour intended meaning of the close operation is more than simply ensuring safe 
type generalizations. We want to enforce non-mutability of the returned data-structure which 
is a much stronger property of dynamic semantics compared to the weaker property of merely 
avoiding run-time type-errors (type-safety). For polymorphic objectsr non-mutability implies 
type-safety and vice versarbut that is not the case for monomorphic objects. As the preceding 
discussion showsr ensuring non-mutability involves a simple form of escape analysis on the 
part of the compiler which is conventionally performed using dataflow analysis or abstract 
interpretation [GP90rGPG9irHI89]. Indeedrall the imperative type systems in the literature 
concentrate on the issue of type-safety alone. 

In our caser we intend to model such simple form of escape analysis for free using the 
existing machinery of our type system that is already required to ensure its type-safety. Our 
machinery ensures true functional semantics for successfully closed objectsr?. e.Tsuch objects are 
guaranteed to be side-effect free and can participate in compiler optimizations such as common 
sub-expression elimination and code-hoisting that depend upon the objects being functional. 
Thusrthe close construct serves as a true interface between the low-leveirimperative layer and 
the high-level functional layer of the language. 

2.3.4 Efficiency and Parallelism 

Consider the following example adapted from [BNA91] that builds a ra-bucket functional his- 
togram of objects stored in a binary search tree. The search tree datatype is also shown below 
for convenience: 

Example 2.16: 

type tree t = leaf | node t (tree t) (tree t) ; 

def histogram t n = 

close { a = m_vector (l,n); 

_ = { for i <- 1 to n do 

a![i] = }; 
_ = accum tan; 

in a }; 

def accum leaf a n = () 
I accum (node x 1 r) a n = 
{ i = hash x n; 

a![i] = a![i] + 1; 
_ = accum Ian; 
_ = accum ran; } ; 

The histogram function allocates an empty mutable vector with n buckets and initializes 
each of the buckets to zero. The accum function uses pattern-matching to traverse the tree 
structure recursively and increments the count in the appropriate bucket. 9 

A couple of important observations can be made about the above example. Firstrall ac- 
cumulations are made to the same mutable array which is closed and returned at the end. No 
copying is involved during accumulations or at the time of returning the final array. Most 



The notation "a! [i]" in Id denotes M-take/M-put operations on mutable arrays with read-and-lock/write- 

and-unlock semantics. The notation " " denotes a local barrier. All the computation above the barrier must 

terminate before any of the computation below the barrier is allowed to proceed. See [BNA91, Bar92] for details. 



41 



strongly-typed systems would only allow creating an internal mutable array to which accumu- 
lations are maderthen copy the final tallies to a functional array which is returned. Hencer 
overall functional behavior is achieved at the cost of copying the final data-structure which 
may be quite expensive. The close construct automatically achieves the functionality without 
sacrificing the efficiency in such cases. 

Secondrall computations in Id are performed in parallel by defaultr constrained only by 
data-dependencies. In the above examplerthe histogram initialization and the entire tree ac- 
cumulation can potentially be done in parallel. The close construct places no restrictions on 
the kind of parallel activities that can occur within the encapsulated expression — it simply 
closes and returns the final result. In a purely functional settingFsome compilers would per- 
form extensive destructive update analysisriinearity analysisFuse linear type systemsFabstract 
datatypes or monadic language constructs [Blo89rWad90rHud92rPJW93rFPJ94] to deter- 
mine that the histogram may be safely single-threaded through the computation and hence 
modified in place. Not only does this require a lot of compiler analysisrbut single-threading 
the computation completely destroys the parallelism inherent in the problem. 

2.3.5 Termination of Side-Effects before "Close" 

Example 2.16 illustrates another important point. Given the parallel execution model of Idr 
we must wait until all the accumulations have completed before closing and returning the his- 
togram array in order to guarantee that the returned array is not updated anymore. This is 

ensured by inserting a local barrier ( ) before returning the histogram which waits for all the 

computations before the barrier (issued in the current scope) to terminate before proceeding 
to the computations after the barrier. The barrier may be considered as an independent syn- 
chronization operation necessary for closing mutable objects in the presence of parallel updates 
(as shown here)ror it could be taken as part of the close operation itself. In the latter caser 
the close construct would behave like a strict encapsulator that releases the closed object 
only when the encapsulated computation has completely terminatedr rather than as a mere 
type-converter. 

The readers may have noticed that we did not use barriers in Examples 2.13 and 2.15. 
This is because of the different underlying memory access protocols being used for the objects 
in those examples. Examples 2.13 and 2.15 use I-structure arraysFwhile Example 2.16 uses 
an M-structure array. A barrier may be necessary when the memory access protocol used for 
implementing an imperative object is not the same as that of the corresponding closed object. 
We discuss the various memory access protocols below. 

Memory Access Protocols 

Id defines three classes of data-structures at the language level: Functional I-structurer and 
M-structure. Functional data-structures are read-onlyr I-structures are write-oncer and M- 
structures allow multiple updates. At the architecture leveirthese data-structures map into the 
following three kinds of memory access protocols: 

Unsynchronized Memory Access — This is the ordinary load/store memory access used 
in conventional architectures. Each memory transaction is assumed to be exclusive and 
non-blocking. There is no synchronization of any kind between readers and writers. 

I-Structure Synchronization — The I-structure protocol [ANP89] enforces producer-consumer 
synchronization between a single writer and multiple readers using full/empty presence 



42 



bits on memory locations. A location is deemed empty initially. Multiple readers may 
issue I-fetches all of which block until the single writer performs an I-store changing the 
state of the location to full. The stored data is then distributed to all the blocked and 
subsequent readers. Multiple writes to the same location are considered to be an error. 

M-Structure Synchronization — The M-structure protocol [BNA91TBar92] enforces mutual- 
exclusion synchronization among multiple readers and writers. Readers issue M-take op- 
erations on full memory locations that read the location and leave it empty. A subsequent 
M-put on the location restores the status to full and makes the data available to other 
readers. It is possible to allow only one outstanding M-put operation and several M-takes 
waiting to succeed as done in Idror one could queue up both M-takes and M-puts and 
match them up. 

I-structure and M-structure objects are implemented using their respective memory access 
protocolsrbut functional objects may be implemented using either unsynchronized or I-structure 
access protocol. Howeverrintuitively it should be clear that a given object cannot be accessed 
using two different protocols simultaneously — that would lead to a run-time error. Thereforer 
it becomes necessary to ensure that all in-flight imperative operations on an object have ter- 
minated before it is closed and accessed as a functional object. A barrier may be inserted just 
before the close operation in order to guarantee this. 

Note that we only have to wait for the termination of all memory operations issued from 
within the scope of the close construct because we already ensure that no imperative handle to 
the object being closed can escape this scope. Of courseFno barrier is needed if the underlying 
memory access protocol remains the same when changing from an imperative to a functional 
view of the same object. For instanceFcurrently the Id compiler uses the I-structure protocol 
to implement all functional objects. Thereforer no barrier is needed when closing I-structure 
objects into functional objects (Examples 2.13 and 2.15)rwhereas a barrier is required when 
closing M-structure objects into functional objects that use the I-structure protocol (Exam- 
ple 2.16). 

Protocol Conversions 

Figure 2.1 depicts all possible protocol conversions at the time of closing an object. An imper- 
ative object may be implemented using any one of the three memory access protocolsr while 
a functional object may use either the unsynchronized read protocol or the I-structure read 
protocol. The arrows depict the protocol conversion implied by the close operation. The 
annotations on the arrows summarize the kind of barrier requiredrif anyrfor the underlying 
protocol conversion. We discuss the various cases below. 

When closing an unsynchronized mutable object into an unsynchronized functional object 
(refer to Figure 2.1)Twe need to make sure that all previously issued write operations have 
terminated. Otherwiserthe closed object may get updated after being closed. This is enforced 
by using a write-barrier before closing the mutable object. 

Althoughrthe Id compiler uses the I-structure protocol to implement all functional objectsr 
it is possible to implement functional objects that are known to be strict without any synchro- 
nization. It is also possible to introduce unsynchronized objects as another primitive data class 
within the language that need not pay the significant overhead of I-structure synchronizationr 
especially when it is emulated in software. In this situationFa write-barrier is necessary when 
closing an I-structure object into an unsynchronized object. OtherwiseFsubsequent unsynchro- 
nized read operations would not see the effect of any outstanding I-store operations. Howeverr 



43 



Mutable Object Synchronization Protocols 

Unsynchronized l-Structure M-Structure 


Functional Object 
Sync. Protocols 


Write-Barrier 




Write-Barrier 


Strict, Unsynchronized 


Full Barrier 


No Barrier 




Single Outstanding Put - Take-Barrier 


l-Structure 


Multiple Outstanding Puts - Full Barrier 









Figure 2.1: Conversions among Synchronization Protocols at the time of Closing. 

any outstanding I-fetch operations can always be satisfied using the data that is already presentF 
therefore we need not wait for any outstanding I-fetch operations to terminate before closing 
the object. 

When closing M-structure objects into unsynchronized functional objectsrit is clear that 
we must wait for both M-take and M-put operations to terminate before accessing the object 
with unsynchronized read operations. This is because both M-take and M-put may modify the 
actual contents of the memory location and all such modifications must complete before it is 
safe to use the object functionally. 

We already mentioned that the I-structure protocol is currently used within the Id compiler 
to implement both functional and I-structure objects. The only difference between the two 
at the language-level is that functional objects are allocated and completely defined at the 
same time and then subsequently used in a read-only fashionFwhile I-structure objects may be 
allocated and then independently filled via assignment anywhere within the program. Since the 
underlying synchronization protocol is the same in both casesFno barriers are necessary when 
closing an I-structure object into a functional object. 

Finallyrwhile closing M-structure objects into functional objects that are implemented using 
the I-structure protocoirif only one outstanding put is allowedrthen it is possible to use only a 
tafce-barrier [Bar92] instead of the usual full barrier. This is because once a location is empty 
after a successful M-take operationl" multiple functional I-f etches may be allowed to queue up 
and the ensuing M-put can be made to satisfy them just like an I-store would. 

Discussion 

Since there are so many possibilities due to variations in data classesr synchronization proto- 
cols and their implementationsrhenceforthrwe shall assume that the close construct is always 
explicitly or implicitly accompanied with the appropriate barrier where necessary. The main 
thrust of our research is to guarantee type-safety and dynamic non-mutability via static anal- 
ysis which is orthogonal to the issue of guaranteeing dynamic termination of parallel update 
operations upon closing an object. Thereforerin the rest of this thesis!" we will only concen- 



44 



trate upon strictrsequentiairunsynchronized accesses to memory as done in most conventional 
languages. 

2.4 Sound Typings for Imperative /Closed Objects 

As discussed in the last sectionr our overall strategy for closing imperative objects can be 
summarized as follows: 

1. Firstrwe have to model the "imperativeness" of objects within the type system. 

2. Nextrwe develop sound verification criteria for the type system under which an object 
can be safely closed. 

3. Finallyrwe apply the criteria to each object being closed at compile-timeFverify the safety 
of closing and convert the type of the object appropriately if the verification succeeds. 
Otherwiserwe raise a static "close-error" . 

Following the above outliner in this section we informally discuss the typing machinery 
required for describing and closing imperative objects and present a set of closing strategies 
under which this operation can be done safely. These strategies form the basis of the formal 
static and dynamic semantics presented in the next chapter. We also touch upon some language 
design issue that will be discussed in greater detail in Chapter 4. 

2.4.1 Modeling "Imperativeness" in Types 

In Section 2.2.7Fwe motivated our choice of closure typing system of Leroy as a starting point 
for the typing extensions being proposed in this thesis. We also mentioned that we will need 
some sort of region-based analysis in order to distinguish among various kinds of imperative 
and functional objects. In this sectionFwe informally describe this type representation. 

Our approach takes a middle ground between the effect and the closure typing system of 
Leroy. We model the "imperativeness" of an object using parameterized type constructors where 
a simple region expression is attached to each type constructor that identifies whether or not 
that constructor is imperative. A region expression p is either a region variable rTov the null 
region e. The intuitive idea is that a type constructor with a null region is considered to be 
functionair while the presence of a region variable identifies it to be imperative (c.f. closure 
typing system) as well as provides an abstraction for a set of locations associated with that 
object (c.f. effects system with regions). Another way to look at this is that a non-null region 
expression associated with a type constructor ensures a read/write capability over the objects 
of that typerwhile a null region provides a read-only capability over the objects of that type. 

As an examplerthe type of the application (mkref identity) (Example 2.3) is shown 
below under various type systems: 10 



Standard ML notation [MTH90] uses postfix type constructors in type expressions, as in (u — > u) ref. We will 
follow that notation in Chapter 3 when discussing formal semantics. For now, we use prefix type constructors 
since they are more intuitive. 



45 



Type System 


Type of (mkref identity) 


Standard ML ([MTH90ITof90]) 
Standard ML/NJ ([AM89rHMV93]) 

Simple Effects ([Wri92]) 

Effects with regions ([TJ92]) 

Closure Type ([Ler92]) 

Closure Type with regions (this work) 


ref (u —7- u) 
ref (u° -► u°) 

ref (t — > t) with effect {alloc{t, /)} 

ref (t — > t) with effect {alloc r (t — > t)} 

ref (t -^ t) 
ref(r) (t -^ t) 



In our representation (the last row) T the type constructor ref is accompanied by a new 
unique region variable at every application of mkref within the program. These region variables 
participate in type unification thereby abstractly keeping track of the set of statically aliased 
reference locations and their scope of accessibility rather than relying on various classes of type 
variables or a separate set of effects. 

The advantage of this representation is that it allows us to close the type of an imperative 
object by simply replacing the appropriate region variables in a type constructor by the null 
region e under suitable conditions. The "imperativeness" of an object can still be determined 
syntactically by examining its parameterized type constructor so we are still following the 
closure typing system; no separate effects need to be collected. 

Furthermorera direct correspondence can be established between a user-defined imperative 
type constructor that is parameterized by one or more non-null regions and a completely func- 
tional version of the same type constructor by simply erasing all its qualifying region expressions 
without disturbing the type constructor itself. For instanceFnow we can define just a single 
parameterized array datatype vector (p) where a region p = r represents an Fstructure array 
and a region p = e represents a functional array. 11 The functional type constructor vector is 
now considered to be a type synonym for vector (e). 

Having independent region variables also separates the issue of type polymorphism from non- 
mutability quite well. The imperativeness of an object is reflected in the regions associated with 
its type constructor and not in its polymorphic type variables. Indeedrimperative properties 
of a monomorphic type such as point given below can also be accurately represented and 
manipulated: 

Example 2.17: 

type point = pt ! float ! float; 

The type constructor point will be parameterized by two region variables pointer 1^2) rep- 
resenting the fact that it has two mutable fields that can be closed independently. The exact 
association of region variables to mutable fields can be specified explicitly within the type dec- 
laration or defined implicitly. We will come back to these language design issues in Chapter 4. 
Nowriet us look at some sound verification strategies for closing imperative objects. 

2.4.2 Handling the Environment 

Once an imperative object is created and is made accessible as part of the environment at a 
particular scopent is nearly impossible to close it safely at that scope or at any scope lexically 
inside it because many other objects may already hold a write handle to it. That is why we 



However, M-structure arrays would still require a separate type constructor in order to distinguish them 
from I-structure arrays. We will come back to this issue in Chapter 4. 



46 



specified the close construct as an encapsulator of the entire program fragment that constructs 
the imperative object (Section 2.3.1) rather than as a mere type-converter. This situation is 
further complicated by the fact that the scope of accessibility of a mutable object is not always 
the same as the scope of its allocation because a locally created object may be made accessible 
non-locally by storing it into a global data-structure. The function escape_l of Example 2.15 
illustrates this problem. A write handle to the locally allocated object a is made accessible by 
storing it into the global object b. Now anybody looking at b could get hold of a and assign 
into it. Thereforerit is not safe to close or generalize a when it is returned from the function 
escape_l. 

Fortunatelyr modeling the imperativeness of an object using region variables allows us to 
detect this situation statically. The region variable associated with the type of an imperative 
object becomes visible in the enclosing type environment when that object is exported into the 
enclosing value environment. This is illustrated below: 

Example 2.18: 

b = i_vector (1,1); °/, b :: (vector (ri) (vector (r 2 ) t)) 

def escape_l n = 

close { a = i_vector (l,n); 

b[l] = a; '/, a :: (vector(r 2 ) t) 

in a }; °/, Unsafe close detected. 

The assignment (b[l] = a) causes the region variable r 2 contained within the type of the 
array a to become visible in the type environment enclosing the close construct through the 
type of array b. This fact may be used as a static test while typing the close construct to 
detect such escaping objects. This is summarized in the following typing strategy: 

Closing Strategy 1 An object may be safely closed at the lowest lexical scope higher than the 
scope of its creation at which none of the region variables contained in its type occur free in the 
type environment. 

Sometimesrthe type of a mutable object can escape into the type environment without 
actually leaving a write handle around. This may happen if the type of the mutable object 
is shared with some other global object due to type-unification. This phenomenon is called 
region-aliasing and is illustrated in the following example: 

Example 2.19: 

a = i_vector (1,1); °/, a :: (vector (r{) t) 

b = close { c = i_vector (1,2); °/, c :: (vector(r 2 ) t) 

d = if ... then a else c; 
in c }; '/, Cannot close due to region-aliasing. 

In the above examplerthe typing of the conditional expression unifies the region variables 
r\ and r 2 of the arrays a and c respectively. NowFaccording to Strategy lTthe array c cannot 
be closed because its region variable is visible in the enclosing type environment even though 
the array itself does not escape into the enclosing scope in any way. Such cases are unavoidable 
in a conservativerstatic type inference system. 

2.4.3 Handling Structured Results 

Until now we were considering cases where a singlerflatr local data-structure is closed and 
returned. The function escape_2 in Example 2.15 illustrates the case when the mutable object 



47 



to be closed is returned as part of another object. In generairmultiple objects could be closed 
and returned from a scope and all of them would have to be verified for safety simultaneously 
because they may refer to each other. 

In function escape_2 of Example 2.151" the returned object is a 2-tuple both of whose 
components point to the same shared array a. Sincerthe second component of the return tuple 
provides a write handle to the same arrayT closing its first component should be illegal. We 
reproduce the example below: 

Example 2.20: 

def escape_2 n = 

close { a = i_vector (l,n); °/, a :: (vector (r) t) 

in a, a } :: (vector _),_; 

In terms of typesFwe observe that the region variable r in the type of array a would get 
erased in the type of the first component (due to close) but would still be present in the type of 
the second component. This fact can be used to detect such escaping write handles as expressed 
in the following typing strategy: 

Closing Strategy 2 Local data returned from a scope is allowed to be closed only if none of 
the region variables being closed occur free in the remaining type of the returned data. 

The above strategy stresses two important points. Firstrwe must specify exactly which 
occurrences of region variables we are interested in closing. In some senserthis requires us to 
specify exactly which fields or locations in a mutable object are we interested in closing. 

Secondrthere should not be any way to access the open version of the object being closed 
via the contents of the object itself. Since the structure of an object is reflected in its typerthis 
check can be performed statically by testing whether any of the region variables being closed are 
visible in the type of the rest of the object. Note that this does not preclude the possibility of 
closing recursive or cross-referenced mutable objects. The only restriction is that all references 
to the same mutable object must be closed simultaneouslyTotherwise the close operation will 
not be safe. 

Note that the function escape_2 would be acceptable if both the write handles being re- 
turned were closed at the same time. The following example would also be acceptable since the 
region variables associated with the types of a and b are unrelated: 

Example 2.21: 

def escape_2' n = 

close { a = i_vector (l,n); 
b = i_vector (l,n); 
in a, b } :: (vector _),_; 

Here a may be closed successfully and converted into a functional data-structure while b 
remains mutable and is typed in the usual way. 

2.4.4 Handling Functions 

The simple Strategy 2 works well with explicitly nestedrfirst-order data-structures like tuples 
and arrays. Function closures present a different problem as illustrated by the definition of 
escape_3 in Example 2.15. HereFa write handle to the array a escapes within the definition of 



48 



the function g. The ordinary Hindley/Milner type of the function can not capture this fact at 
all since it only records the types of arguments and the result of the function. 

This is where Leroy's closure typing information carried on the function type proves useful. 
Using the closure typeFwe can easily determine if the region variable being closed is present 
within the returned closure. If sorthen the close operation fails. With this additionrthe 
Strategy 2 will be able to detect the escape of the write handle to array a from escape_3 within 
the closure type of the function g. 

Note that in the closure typing systemrthere is no way to distinguish between a function 
reading from a mutable object and another that writes to it. ThereforeFall such functions are 
conservatively considered to be potential writers and the region variables contained within their 
closure types should never be closed. This is expressed in the following strategy: 

Closing Strategy 3 Region variables occurring within the closure type of a function are never 
allowed to be closed. 

In a more expressive effect-based system [TJ92]Tone might be able to separate functions 
that only read from a mutable object from those that both read and write the object. In that 
caseronly the latter class is a candidate for potential type-safety violationrthe functions that 
only read from a mutable object may be allowed to close those objects. 

2.5 Summary 

To summarizerwe have informally shown above how to extend a state of the art imperative 
type system [Ler92] with a type abstraction mechanism that can be used to convert imperative 
objects into functional objects in a type-safe and transparent manner without the loss of storage 
efficiency or parallelism. Specificallyr we have proposed a new type-domain construct called 
close that controls this type abstraction as a program encapsulator. We have informally 
shown several typical uses of such a facilityrdiscussed its implications on efficiencyr parallelism 
and dynamic memory access protocolsFand outlined possible strategies to verify its correctness 
within the type system. FinallyFwe have also given a flavor of the kind of syntactic and semantic 
machinery that may be required to expressFpropagate and analyze such information. The next 
chapter formalizes these ideas in the context of a polymorphicFstrictrsequential language and 
shows a soundness theorem guaranteeing that closed objects verified by our type system cannot 
be updated during evaluation. 

Our guiding principle behind this approach has been to engineer a practically useful notion 
of encapsulating imperative programs and data-structures into functional abstractions. Our 
ideas are geared more towards simplicity and run-time performance of user programs (space 
efficiency and preserving parallelism) rather than towards sheer expressive power of the type 
system. 



49 



50 



Chapter 3 

Semantics of "Close" 



In this chapteiTwe describe the semantics of the close operation. This semantics is presented 
in the framework of a small kernel language that supports recursive functionsr tuplesr and 
simple reference locations. In Chapter 4rwe will extend this system to handle more general 
data-structures such as arrays and algebraic types. Our type system is a direct extension of 
the Closure Typing system presented in Chapter 3 of Xavier Leroy's Ph.D. thesis [Ler92]. 

We present the static and the dynamic semantics of our kernel language and show a corre- 
spondence between the two in the form of a soundness theorem (Theorem 3.16). This is our 
main result. It gives us the guarantee that well-typed terms do not run into run-time type- 
errors. The theorem also implies that mutable objects can be safely considered to be functional 
once they are successfully closedri.e.rin a type-correct program it is impossible to update an 
object that has been closed by the type system (Corollaries 3.17 and 3.18). FinallyFwe use the 
same type inference algorithm as described in [Ler92] that infers the correct and most general 
type of every expression in the program. 

As far as possiblel" we have kept the same mathematical notation as used in [Ler92]. 
Throughout this thesisFall symbols appearing in typewriter font are taken verbatim. They 
denote syntactic entities that stand for themselves. Symbols appearing in SMALL CAPITALS 
denote classes of objects. Unless specified otherwisel" Greek symbols and symbols appearing 
in italics stand for meta-variables that can be replaced with specific object instances in their 
class. 

3.1 Kernel Expression Language 

3.1.1 Expression Syntax 

The EXPRESSION language is defined below: 



expressions: 



op (a) 

f where f(x) = a 

a\ ai 

let x = a\ in <22 

(«1, • • • , (In) 

close a 



constant 

identifier 

primitive application 

recursive function 

application 

let-binding 

ra-tuple 

close expression 



51 



In this grammaiTa; and / range over an infinite set of IDENTIFIERS, c ranges over a prede- 
fined set of CONSTANTS including unit (O)rboolean (truerf alse) and integer (. . .FUDELE. .) 
constants. 

In the expression op(a)Top ranges over a predefined set of OPERATORS including the usual 
arithmetic and comparison operators!" itb. element projection operators for ra-tuplesr and a 
ternary conditional operator. This set also includes the primitive operators to allocate (ref )T 
dereference ( ! ) and assign ( : =) mutable reference locations that will be described later in more 
detail. In generair arguments of multi-arity operators are supplied as tuples! 1 but we will 
freely use special syntax for some common operatorsITor example (if. . .then. . .else. . .) for the 
conditional oper&torT (x :=v) for reference assignmentrand simple pattern matching for tuple 
projection. 

The expression / where f(x) = a denotes user-defined recursive functions. The identifier 
/ can occur inside the expression a. This makes our small language more realistic and allows 
us to provide meaningful examples. The let construct is the source of polymorphism in this 
language. In some of our Id examplesFwe represent several let-bindings together in a block 
enclosed within braces ({}). FinallyFwe have added the close construct that enforces functional 
behavior on the data-structure being returned from the expression a. 

The set of FREE IDENTIFIERS of an expression a is denoted by J- (a) and is computed in the 
usual manner as shown below: 

T{c) = <j) T{f where f(x) = a) = T{a) \ {/, x} 

J-{x) = {x} J 7 (let x = a\ in 02) = J-(ai) U (J- {0,2) \ {%}) 

T(op{a)) = T{a) T(a u . . .,a n ) = \Ji<i< n -^K) 

J- (a 1 (12) = J-(ai) U J-{(i2) J- {close a) = T{a) 

3.1.2 Dynamic Semantics 

The dynamic semantics of the above language is defined using relational semantics. We define a 
predicate relation between syntactic expressions and results that tells whether a given expression 
can evaluate to a given result. This relationFcalled EVALUATION JUDGMENTTis of the following 
form: 

e h a/s =>■ r 

Here e is an environment!^ is an initial STORETand r is the RESULT of evaluating the expres- 
sion a under the environment e and the initial store s. Evaluation judgments are established 
using a system of axioms and inference rules. This technique is also known as "Structured 
Operational Semantics" (SOS) [PI08I]. 

Semantic Objects 

Firstrwe define the semantic objects used in the dynamic semantics: 



Primitive operators are not allowed to be curried. 



52 



results: 


r 


::= v/s 
err 




value and result store 
error 


values: 


V 


::= c 




constant 






| (n-tupui,.. 


••,w n ) 


ra-tuple 






| (clsr/,a;,a, 


c> 


function closure 






1 l 




store location 


STORABLE VALUES: 


w 


::= offw 
ulfo 




read/write value 
read-only value 


ENVIRONMENTS: 


e 


::= {xi h-» Vi,. 


• • 7 ^n ' ' ^n) 


stores: 


s 


::= {liHtci,. 


. . , l n I— ?■ 


w n ] 



An evaluation can either result in a type-error or it produces a well defined value along with 
the final store. A well defined value is either a constant base valueFa tuple of valuesFa function 
closureror a store location. 

Environments bind free identifiers of an expression to values. Stores map locations to 
storable values that consist of a value and a tag that denotes whether that location has 
read/write or read-only semantics. This flag is used in defining the semantics of the close 
construct. We assume selector functions value (w) and tag(w) that select the value and tag 
respectively from a storable value. 

Both stores and environments are finite mappings that support the following operations: 

Notation 3.1 

1. For any mapping F , we denote the DOMAIN of F by Dom(F) and its RANGE by CoDom(F). 

2. Fhe extension of a mapping F at the domain point p with a range value q is written as 
F + {p i— > q] and is defined in the usual way: 

(F + {p^ q })(x) = S [ q F{x) ^=J ge 

3. Fhe restriction of a mapping F to the domain A, where A C Dom(F), is denoted by F \a- 
4- A finite mapping F = {p\ \— > qi, . . .,p n \— > q n } is considered to be undefined outside its 

domain {p\ ■ ■ -p n } unless specified otherwise. 

Given a value vTwe inductively define C(v ) to be the set of all locations directly contained 
within it: 

C(c) = cf> 

£((n-tupi;i,...,i; n » = \Ji<i< n ^( v i) 

£((clsr /, x, a, e)) = C(e) 
£(/) = {/} 

For an environment e = {x\ \— > v\, . . . , x n \— > v n }Twe define C(e) = \\ l<i<n L(vi) . 

We define the set of locations reachable from a given object with respect to a given store as 
follows: 

Definition 3.2 (Reachability) Given a value v and a store s, we define Reachable (v,s) to 
be the set of all locations within the domain of s that are either directly contained within v or 
transitively contained in a value stored at such a location via the store s. Fhis extends naturally 



53 



(pointwise) to values present in an environment e. 



Reachable {c, s 
Reachable ((n-tup v\, . . . , v n ), s 
Reachable ((clsr /, x, a,e),s 
Reachable (I , s 
Reachable (I , s 
Reachable (e, s 



4> 

Ui<i<n Reachable (vi, s) 
Reachable (e, s) 

4> 

{/} U Reachable(v' , s) 
Ui<i<n Reachable (vi, s) 



{■ 



I g - Dom(s) 
valuers (I)) = v' 

;iH>t) lr ..,j; n H>t) n } 



Although the above definition is correctrit does not lead to a well founded induction on 
the structure of values because we may have circularly defined data-structures. HoweverFat 
any given step of evaluationrthe size of a value and the number of locations reachable from it 
are both finiteFso we can easily compute the reachable locations using the following recursive 
algorithm that is guaranteed to terminate: 



Gather-Locations(u, s, L) 

1 case v of 

2 c : return L 

3 (n-tup v\, . . ., v n ) : for i <— 1 to n do 

4 L ^— L U Gather-Locations(u 8 -, s, L) 

5 return L 

6 (clsr /, x, a, e) : let {x\ \— > v\, . . . , x n \— > v n } = e 

7 for i : <— 1 to n do 

8 L ^— L U Gather-Locations(u 8 -, s, L) 

9 return L 

10 I : if / G IV / ^ Dom(s) then return L 

11 else let v' = value(s(l)) 

12 return Gather-Locations(V, s, {/} U L) 



The above algorithm traverses the given value v in a depth-first recursive fashion and ac- 
cumulates the set of all its reachable locations in the variable L. If the current value is a valid 
location of the given storerthen its contents are recursively traversed at Line 12 only if it is not 
already in the set L. ThusFno object accessible from the given value is traversed more than 
once and the algorithm is guaranteed to terminate. 

The reachability function given in Definition 3.2 can now be computed as follows: 



Reachable(v,s) = Gather-Locations(u, s, (f>) 



Evaluation Rules 



Figure 3.1 shows the axioms and inference rules for establishing evaluation judgments e h 
a/s =>■ r. An axiom P allow us to conclude that the proposition P holds. An inference rule is 
of the form: 

Pi ■■■ Pn 

P 

All the antecedents P\, . . .,P n must hold in order for us to conclude the consequent P. 

The inference rules given in Figure 3.1 provide a strictr sequentiair call- by- value seman- 
tics for our kernel language. This can be seen from the fact that the store is sequentialized 



54 



const: e h c/s => c/s 

x G Dom(e) 



IDENT: 



abs: 



app: 



tuple: 



let: 



alloc: 



deref: 



assign: 



e h x/s =^ e(x)/s 
Y = .F(/ where f(x) = a) 



e h (/ where /(#) = a)/s =>■ (clsr /, s, a, e |y)/s 

e h a\/s =>■ (clsr /, s, ao, cq) / s\ 

e h a 2 /si => ^ 2 /s 2 

eo + {/ !-> (clsr /, x, a , e ),x ^ v 2 } h a /s2 => w/s 3 



e h (ai a 2 )/s =>■ f/s3 




e h ai/s =>■ fi/si • • • eh a n /. 


Sn — l ~^ ^n/^n 


eh (ai, . . . , a n )/s => (n-tup v t , . 


• • ; ^n) / $71 


e h ai/s =>■ fi/si e + {iH) u 1 } h 


a 2 /s 1 => u 2 /s 2 


e h (let a; = ai in ci2)/s =>■ 


w 2 /s 2 


e h a/s =>■ f/si / G - Dom(si) 


e h ref (a)/s => //(si + {/ h-» 


u, rw}) 


e h a/s =>■ //si / G Dom(si) value(si(l)) = u 


e h !a/s =>■ f/ s i 




e h a/s =>■ (/, f)/si / G Dom(si) 


tag(si(l)) = rw 



e h :=(a)/s=>- ()/(si + {/ h-> u, rw}) 



e h a/s =>■ //si si (/) = u, rw 

CLOSE: £ = Reachable {I ^ s\) U Reachable (e, s\) U U/'c_Dom(s) R eac hable(V , si) 

e h (close a)/s =>■ //(si |l +{/ 1— > v, ro}) 

Figure 3.1: The Dynamic Semantics of the Kernel Expression Language. 

through various computations (APPrTUPLETand LET rules) and that function and let bodies 
are evaluated in an environment where arguments are bound to values (app and LET rule). 

Figure 3.1 only shows the inference rules that lead to the computation of a well defined value. 
Our convention for the generation or propagation of the err result is as follows. Some rules 
have antecedents that require pattern matching: the operator in the APP rule must evaluate to a 
closure valueltfie expression in the CLOSE rule must evaluate to a location with a read/write tagr 
the expression in the DEREF rule must evaluate to a locationFand the location to be assigned in 
the ASSIGN rule must have a read/write tag. We add an err generating inference rule for every 
case of mismatch between any of these patterns and the actual values and tags found during 
their evaluation. SimilarlyT err propagating inference rules are added for each antecedent in 
an inference rule that may generate an err result. In all these casesrthe consequent simply 
evaluates to the err result and all propositions following the error generating antecedent are 



55 



ignored. 

Most of the axioms and inference rules shown in Figure 3.1 for the various kernel language 
constructs are fairly standard and self explanatory. We have shown the primitive operator rules 
for reference operators only. Usual arithmetic and structural operators (tuple projection) are 
defined in the usual way. The ALLOC rule initializes new reference locations with a value and 
a read/write tag. We assume that an infinite set of new locations is available. The DEREF rule 
reads the value out of an existing location regardless of its tag. The ASSIGN rule only assigns 
to locations which have a read/write tag. 

The CLOSE rule requires a little more explanation. This rule is the only place where the 
read/write tag of a location is explicitly changed to a read-only tag. This makes that reference 
object non-mutable. We have also restricted the domain of the final store to the reachable 
locations of the location being closedrthe current environment and the locations of the initial 
store. This operation removes some non-reachable garbage locations from the final store that 
may contain references to the location being closed. Although this operation seems somewhat 
artificiairit is of immense help in reducing the complexity of the soundness proof later on. We 
motivate the reasons for doing so below. 

A more intuitive semantic rule for the close construct would be: 

, e h a/s =>■ l/s\ s i(0 = v , rw 



e h (close a)/ s =>■ l/{s\ + {/ h- > v, ro}) 

This rule does not restrict the domain of the resulting store. Why would we want to do 
that operation anyway? The following example brings out the issue: 

Example 3.1: 

a = close { 
b = ref 1; 
c = ref b; 
in b }; 

Within the scope of the close blockra freshly allocated reference c points to another fresh 
reference b. Both these references are present in the store that is returned from the block 
although there is no way to access the reference c once that block is exited. The unreachable 
reference to b via c creates technical problems while showing the correspondence between the 
static and dynamic semantics 2 therefore we would like to get rid of it. One direct way of 
achieving this is to restrict the domain of the final store to contain just the reachable locationsr 
as we have done in the CLOSE rule above. 

The alternate close' rule is not wrong. We just have to do more work while showing its 
soundness restricting our attention to just the reachable locations of the current value and 
the current environment with respect to the current store at every step of the proof due to 
the presence of garbage locations such as c scattered in its domain. In technical termsrthis 
would imply that all our proofs must be carried out using the method of co-induction (due to 
the possibility of having cyclic data-structures within the store) rather than a straightforward 
induction on the structure of the current value and a separate induction involving all the 
locations in the domain of the current store. ThereforeFwe have opted for the somewhat non- 
intuitive CLOSE rule in order to avoid the complex semantic machinery required to show the 
soundness of the alternate close' rule. 



Since we have not yet shown the static rule for close or the semantic machinery used to show soundness, 
we request the reader to bear with us for the time being. 



56 



3.1.3 Properties of the Evaluation Rules 

In order to convert read/write store locations into read-only locations in a safe mannerFwe 
need to characterize the allocationFreachabilityFand manipulation of store locations during an 
evaluation. In this sectionFwe show two important properties: locations reachable through the 
result of an evaluation are either new locations or reachable through the evaluation environment 
(Proposition 3.5)Tand old locations that get updated during an evaluation are always reachable 
through the evaluation environment (Proposition 3.6). Both these propositions will be used 
later in proving the soundness of the close construct. Butr first we show some auxiliary 
propositions. 

It is evident from the evaluation rules presented in Figure 3.1 that the domain of the store 
keeps growing during an evaluation. We do not model storage reclamation in these rules. This 
allows us to state the following: 

Proposition 3.3 let a be an expression, v be a value, e be an environment, and sq> «i be initial 
and final stores respectively such that e h a/so =>■ v/s\. Then Dom(so) C Dom(si). 

Proof: by induction on the length of evaluation derivation for a. A simple examination of the 
evaluation rules shows that in all cases except the CLOSE ruleFeither the domain of the store 
grows or it remains unchanged. In the case of the CLOSE rulerthe domain of the final store 
is possibly smaller than that of the intermediate store due to the domain restrictionrbut it 
still includes the entire domain of the initial store by construction. □ 

Nextrwe show that a given property applicable to all locations of a store extends inductively 

to all values and environments that refer to the locations in that store. 

Proposition 3.4 Let e be an environment and soj s i be stores such that Dom(so) C Dom(si), 
and for all locations I G Dom(so) 

I' G [Reachable (/ , si) \ Reachable (I, so)] =^* I' $ Dom(so) y /' G Reachable (e, so) 

Then, for any value v' and environment e' we have, 

I' G [Reachable (v' , si) \ Reachable(v' , so)] ==? I' ^ Dom(so) \J I' G Reachable(e, so) 
/' G [Reachable (e ! , si) \ Reachable(e' , so)] ==? V ^ Dom(so) \/ 1' G Reachable(e, so) 

As a corollary, for e' = e we have, 

I 1 G Reachable (e , si) ==? I 1 g" Dom(so) \J I' G Reachable (e , so) 

Proof: by structural induction on v' and the values contained in the environment e' . We show 
the various cases for values. 

Case 1: v ' is c — Triviairsince there are no reachable locations from a constant. 
Case 2: v' is (n-tup v\, . . ., v n ) — By definition of reachability for tuples we haver 

Reachable ((n-tup v\, . . ., v n ), s) = M Reachable (vi,s) (3-1) 

l<i<n 

The result follows from above using the induction hypothesis for each individual V{ and the 
following algebraic identity for arbitrary sets: 



U *i \ MJ *U U (X\Y t ) (3.2) 



,l<«<n / \KKn / !<«<n 



57 



Case 3: v' is (clsr /, x, ao, eo) — Same as above. 

Case 4: v ' is / — If / G - Dom(s\) or / G - Dom(so) then we have nothing to prove. Otherwise 
the result follows from the given relation regarding locations. 

The environment hypothesis follows from the value hypothesis using the definition of reach- 
ability for environments and Equation 3.2. □ 

Now we prove the proposition that partitions the locations reachable from the result of a 
evaluation into those that are freshly allocated and those that are reachable from the evaluation 
environment. 

Proposition 3.5 (Fresh Locations) Let a be an expression, v be a value, e be an environ- 
ment, and so, s\ be initial and final stores respectively such that e h a/ sq =>■ v/s\. Then, 

I 1 G Reachable (v , si) ==?■ I' G - Dom(so) \J I' G Reachable (e , sq) 

and for all locations I G Dom(so), 

I 1 G [Reachable (I, si) \ Reachable (I, so)] ==?■ I 1 $ Dom(so) \J I 1 G Reachable (e , sq) 

Proof: by induction on the length of evaluation derivation for a. We consider the various cases 
for the last evaluation rule in the derivation. 

Case 1: CONST — Triviairsince Reachable (c, s\) = (f> and s = s i- 
Case 2: IDENT — Triviairsince Reachable [e[x) , s\) C Reachable (e, s\) and so = s i- 
Case 3: ABS — Triviairsince Reachable ((clsr f,x,a,e |y),si) = Reachable (e |y,si) C 

Reachable (e, si) and so = s i- 
Case 4: APP — The evaluation rule is: 

e h ai/s =>■ (clsr /, x, ao, cq) / s\ 

e \- a 2 /si => v 2 /s 2 

ep + {/ !-> (clsr /, a;, a , e ), a; i-> u 2 } l~ «o/s2 => ^7^3 

e h (ai a 2 )/s =>■ f/ s 3 

Let ei = eo + {/ i— t- (clsr /, a;, ao, eo), a; h- > v 2 }. Firstrwe show the value hypothesis for this 
easel" i.e.Twe show: 

/' G Reachable (v, S3) ==?■ I' G - Dom(s) \J I' G Reachable (e, s) (3-3) 

Applying the induction hypothesis for values to the last premise we obtain: 

/' G Reachable (v, S3) ==?■ I' G - Dom(s 2 ) \l I' G Reachable(e\, s 2 ) (3-4) 

Note that /' G - Dom(s 2 ) implies /' G - Dom(s) because Dom(s) C Dom(s 2 ) from Proposi- 
tion 3.3. If /' G Reachable (ei, S2)Tthen using the definition of reachability and e\ we have 
the following two cases: 
• /' G Reachable(eo, s 2 ) — In this caseFwe use the induction hypothesis for locations on 
the second premise in Proposition 3.4 with environment e' = eo to obtain: 

/' G [Reachable (eo, s 2 ) \ Reachable(eo, «i)] ==?■ I' G - Dom(s\) \J I' G Reachable (e, s\) 

(3.5) 



58 



To eliminate Reachable(eo, si)Twe use the induction hypothesis for values on the first 
premise to obtain: 

/' £ Reachable(eo, s\) ==? I 1 £" Dom(s) \J I' £ Reachable[e, s) (3-6) 

Alsor we simplify /' £ Reachable (e, si) on the right hand side of Equation 3.5 by 
applying the corollary in Proposition 3.4 for the first premise: 

/' £ Reachable (e , s\) ==? I 1 £" Dom(s) \J I' £ Reachable (e , s) (3-7) 

Combining Equations 3.5r3.6rand 3.7 we obtain the following as desired: 

/' £ Reachable (eo , S2) =^* I' $ Dom(s) \J I' £ Reachable (e , s) (3-8) 

• /' £ Reachable (v2, S2) — In this caseFwe use the induction hypothesis for values on 
the second premise and then simplify as above using Proposition 3.4 to obtainr 

/' £ Reachable(v2 1 S2) =^* I' $ Dom(s\) \J I' £ Reachable[e, s\) 

=>• /' £" Dom(s) \/l' £ Reachable{e, s) (3.9) 

Combining statements 3.8 and 3.9 proves the statement 3.3 as desired. 

Now we show the location hypothesisri.e.rfor all locations / £ Dom(s) we show that: 

/' £ [Reachable (/ , S3) \ Reachable (I, s)] ==?■ I 1 $ Dom(s) \J I 1 £ Reachable[e, s) (3.10) 

We use the following algebraic identity that is true for arbitrary sets: 

X\Y C(X\Z)U(Z\Y) (3.11) 

Using this identityFwe obtain: 

[Reachable (I , S3) \ Reachable(l, s)] 

C [Reachable(l , S3) \ Reachable(l, S2)] U [Reachable (I , S2) \ Reachable(l , s)] 
C [Reachable(l , S3) \ Reachable(l, S2)] U [Reachable (I , S2) \ Reachable(l , si)] U 

[Reachable(l , si) \ Reachable(l, s)] (3.12) 

Now we use the induction hypothesis for locations for each of the three clauses on the right 
and simplify using Propositions 3.4 and 3.3 to obtain the desired result. 
Case 5: TUPLE — The location hypothesis is shown exactly like the case above. We give the 
argument for the value hypothesis. The evaluation rule is: 

e h cli/sq => vi/si ■■■ eh a n js n _ x => v n /s n 



e h («i, . . . , a n )/s => (n-tup v Xl . . . , u n )/s n 

We have to show that: 

/' £ Reachable ((n-tup v\, . . . , v n ), s n ) ==? I 1 $ Dom(so) \J I' £ Reachable (e, so) (3.13) 

Applying the induction hypothesis for values to each premise (1 < i < n) and simplifying 
using Propositions 3.3 and 3.4 we obtain: 

/' £ Reachable[vi, s 8 ) ==? I 1 $ Dom(so) \J I' £ Reachable[e, sq) (3-14) 



59 



In order to show Equation 3.13rwe need to strengthen Equation 3.14 to /' £ Reachable(vi, s n ) 
(1 < i < n). We use the algebraic identity 3.11 repeatedly to obtain the following: 

[Reachable (vi, s n )\Reachable[vi, s 8 )] C M [Reachable(vi,Sj)\Reachable(vi,Sj_i)\ (3.15) 

We use the induction hypothesis for locations and Proposition 3.4 to simplify each of the 

clauses on the right in the above statement and plug in Equation 3.14 to obtain the desired 

result of Equation 3.13. 
Case 6: LET — Same argument as in the APP case. 
Case 7: ALLOC — The result follows from the induction hypothesis and the fact that the 

allocated location is in fact chosen to be a new location that is not present in Dom(si) and 

hence not present in Dom(s). 
Case 8: DEREF — The result follows directly from the induction hypothesis and the definition 

of reachability for locations. 
Case 9: ASSIGN — The evaluation rule is: 

e \- a/s =? (/, v)/si I £ Dom(si) tag(si(l)) = rw 



e h :=(a)/s =>• ()/(si + {/ h-> v, rw}) 

The value hypothesis follows immediately since no locations are reachable from (). For 
the location hypothesisFnote that the final store S2 = s i + {} l— > u, rw} differs from the 
intermediate store si only at location /. FurthermoreFusing the induction hypothesis for 
values we know that: 

/' £ Reachable{v , si) ==? I 1 £" Dom(s) \J I' £ Reachable[e, s) (3.16) 

Thusrthe location hypothesis will be valid for the location / as well which is assigned the 
new value v. 
Case 10: CLOSE — By constructionrthe final store contains all the reachable locations from 
the location being closedrthe current environmentrand the old store. Thusrboth value 
and location hypotheses follow directly from the induction hypothesis since changing the 
tag of a location does not affect its reachability. 

□ 

FinallyFwe show the proposition that characterizes the set of locations that may get updated 
during an evaluation. 

Proposition 3.6 (Updated Locations) Let a be an expression, v be a value, e be an envi- 
ronment, and so j s \ be initial and final stores respectively such that e h a/ sq =?- v/s\. Then for 
any location I £ Dom(so), 

value(so(l)) / value(si(l)) ==? I £ Reachable (e, so) 

That is, pre-existing locations that get updated during an evaluation are reachable from the 
environment. 

Proof: by induction on the length of evaluation derivation for a. We consider the various cases 
for the last evaluation rule in the derivation. 
Case 1: CONSTFlDENTFand ABS — Triviairsince s = si. 



60 



Case 2: APP — The evaluation rule is: 

e h a\/s =>■ (clsr /, a;, ao> e o)/ s i 

e h a 2 /si => ^ 2 /s2 

ep + {/ !-> (clsr /, a;, a , e ), a; i-> u 2 } l~ «o/s2 => v/s 3 

e h (ai (12) / s =>■ f/s3 

Three possibilities arise for value(s(l)) / value {s^{l)): 

1. value(s(l)) / value(si(l)) — The result follows immediately by applying the induction 
hypothesis to the first premise. 

2. value(s(l)) = value(si(l)) but value(si(l)) / value(s2(l)) — Using the induction hy- 
pothesis on the second premise we obtain that / G Reachable(e,si). Using Proposi- 
tion 3.5 together with Proposition 3.4 for environments we obtain that 

/ G Reachable (e , s\) ==?■ I G - Dom(s)\J I G Reachable (e , s) (3-17) 

Since we know that / G Dora (s)Tthe result follows. 

3. value(s(l)) = value(si(l)) = value(s2(l)) but value(s2(l)) / value{s^{l)) — Using in- 
duction hypothesis on the third premise we obtain that / G Reachable (ei, s 2 ) where 
d = eo + {/ 1— > (clsr /, ai, ao, eo), a; 1— > V2}. This can be simplified to the desired result 
just as in the proof of Proposition 3.5. 



Case 3 
Case 4 
Case 5 



TUPLE and LET — Same argument as above. 

ALLOC and DEREF — The result follows directly from induction hypothesis. 

ASSIGN — The evaluation rule is: 

e \- a/s =>■ (/, v)/s\ I £ Dom(si) tag(si(l)) = rw 



e h : = {a)/s =>• ()/(si + {/ h-> v, rw}) 

For all locations other than /rthe result follows from the induction hypothesis. In case of 
location ITwe apply Proposition 3.5 to the first premise and obtain thatr 

/' G Reachable((l, u), si) ==?■ I' G - Dom(s) \J I' G Reachable[e, s) (3.18) 

It is clear that / is reachable from the pair (l,v). The result follows from the above 
statement and the induction hypothesis that / G Dom(s). 
Case 6: CLOSE — All locations reachable from the initial store s are included in the final 
store by construction. Furthermorerthe values present at these locations are the same as 
those in the store s\. Thusrthe result follows from induction hypothesis. 

□ 

3.2 A Closure Typing System 

Now we will describe our extension to Xavier Leroy's closure typing system [Ler92]. 

3.2.1 Type Syntax 

The type grammar is defined below: 



61 



TYPE VARIABLES: 



TYPES: 



a, (3 



{^T 2 



CLOSURE TYPES: 



REGIONS: 



TYPE SCHEMES: 



a 



t 
u 
r 
t 
i 

T\ , • • • , T n 

t ref(r) 

t re/(e) 

u 

a, it 

r 

e 

V«i . . .a n . t 



regular type variable 

closure extension variable 

region variable 

regular type variable 

base type 

function type 

ra-tuple type 

mutable reference type 

non-mutable reference type 

closure extension variable 

closure type 

region variable 

null region 



In this grammarra function type (— >) is decorated with a CLOSURE TYPE which is a set 
of type schemes together with a closure extension variable u. The closure type of a function 
corresponds to the type schemes of the free identifiers of the function that are stored in its 
closure environment. The order of occurrence of the type schemes in a closure type does not 
matter. Note that the above grammar does not allow more than one closure extension variable 
in a closure type. 

A reference type is parameterized by a REGION expression which could be a region variable 
r or the null region constant e. Regions serve to model the mutability of store locationsFwhile 
types serve to model the structure of dynamic values. That is why the domain of regions is 
much simpler than the domain of types. 

A region variable parameter r on a mutable reference type serves two purposes. It identifies 
the reference type as being mutable and it also serves as an abstract static label for the corre- 
sponding dynamic mutable location (and any other locations aliased to it) that has that type. 
This abstraction is useful in tracking the dynamic mutable locations reachable from a given ob- 
ject by statically observing the region variables present within its type. We will formalize this 
correspondence between regions variables and mutable locations in Section 3.3. Non-mutable 
or "closed" references are identified by a fixed null region constant (e) because there is no need 
to keep track of locations that have been closed. Note that ref(r) and ref(e) are considered to 
be distinct type constructors; they have a similar form only for syntactic uniformity. 

For any type object TTwhere T may be a typel"a closure typel"a regionFor a type schemer 
its FREE VARIABLES T(T) are defined inductively as follows: 3 



m -- 


= {*} 


T{u) -- 


= M 


Hi) ~- 


= ^ 


T{u,tt) = 


= T(a)UT(7r) 


T(Ti-(7ry+T 2 ) = 


= T(n) U F(n) U T{t 2 ) 


T(r) ~- 


= M 


T(T U ...,T n ) = 


~ Ul<i<n ^{Ti) 


T{e) = 


= ^ 


T(t ref(p)) = 


= T(r)~UT(p) 


T(ya t . . .a n .r) = 


= T(t) \{«i ...a n } 



In a type scheme a = Vcui . . .a n . rrthe variables {a\ . . .a n } are called the BOUND VARIABLES 
denoted by B(a). For any type object TFwe also define the DANGEROUS VARIABLES V(T) and 
the DANGEROUS REGION VARIABLES 1Z(T) inductively as follows: 



Note that we are using the same notation here as that for computing the free identifiers of an expression 
because it represents the same concept. The meaning is always clear by context since we never mix types and 
expressions. 



62 



V{t) 


= 4> 


V(i) 


= ^ 


v{ Tl ^y,T 2 ) 


= V(n) 


P(ri,...,r n ) 


= Ul<V<n V ( T i 


V(r ref(r)) 


= T{t ref(r)) 


V(t ref(e)) 


= T>{t) 


K{t) 


= 


TZ{i) 


= ^ 


K(Ti-(7ry+T 2 ) 


= n(n) 


K(Ti,...,T n ) 


= Ul<,-<n^fa 


U{r ref(p)) 


= U{r)UT{p] 



X>(Vai 



V(u) 


= 


^ 


V{a,n) 


= 


V(a)UV(Tr) 


V(r) 


= 


4> 


2>(e) 


= 


^ 


■ a n .r) 


= 


V(t) \{«i ...a n } 


U{u) 


= 





K(<T,n) 


= 


TZ(a)UTZ(7r) 


U{r) 


= 


^ 


TZ{e) 


= 


^ 


■ a n . t) 


= 


K(t) \{«i ...a n } 



Specilicallyrfor a mutable referencerthe region associated with that type and all type vari- 
ables contained within it are considered to be dangerous. The variables occurring inside a 
non-mutable reference type are not considered to be dangerous. For a function closurerthe 
typing rules shown later ensure that the types of all objects reachable from the closure environ- 
ment are recorded in its closure type. Thereforerthe types of mutable references accessible via 
the closure environment are also visible in its closure type and are considered to be dangerous. 

Using the type abstractions shown aboveFwe can accurately capture and control the static 
(type polymorphism) and the dynamic (mutability) properties of imperative data-structures. 
The basic idea of our type system is to use the type of a composite object as a clue to the 
reachable mutable reference locations contained within it. Dangerous variables provide this 
clue directly from the overall type of an object. Intuitivelyrdangerous type variables model the 
polymorphic values stored within mutable objects and the dangerous region variables model 
the mutable locations contained within such objects. 

3.2.2 Static Semantics 

The static semantics of our kernel language is defined in the same manner as its dynamic 
semantics. We define a predicate relation between syntactic expressions and types that tells that 
a given expression elaborates to a given type. This relationFcalled ELABORATION JUDGMENlT 
is of the following form: 

Eh a:r 

Here E is a TYPE ENVIRONMENT which is defined below as a finite mapping from identifiers to 
type schemes. 

type environments: E ::= {x\ H> a u . . . , x n H> a n } 

TYPE SUBSTITUTIONS over this type algebra are finite mappings from regular type variables 
to typesrfrom closure extension variables to closure typesFand from region variables to other 
region variables. We do not allow substituting region variables with the null region (e) because 
that would convert a mutable reference type into a non-mutable reference type. This operation 
should only be performed when it is determined to be safe and is explicitly done using the 
close construct. 

type substitutions: tp,(p ::= {tn-r,...,ii^7r,...,r^r',...} 

Type substitutions are taken to be the identity mapping outside their specified finite domain. 
They also extend naturally over typesFclosure typesFand type schemesrbeing applied to their 



63 



free variables in each case. For a type scheme a = Vcui . . .a n . rTit may be necessary to rename 
some its bound variables oi{ so that they are OUT OF REACH for the type substitution (pTi.e.T 
no oti is in Dom((p) and no oi{ occurs free in any typer closure type or region in CoDom{<p). 
Thenrthe substitution is defined by: 

V?(Vai . . .a n . t) = Ma x ...a n . tp(r) 

The INSTANTIATION of a type scheme a = Vcui . . .«„. To to a type rFwritten as a > tT'is 
defined if there exists a type substitution ip with Dom((p) C {a\ . . .a n } such that r = ¥ , ( r o)- 

In order to simplify our notation for computing free and dangerous variables of sets of 
objectsrwe use the following convention: 

Notation 3.7 Given a set of objects P, 

2- [} peP T{p) = T{P). 
3. [j p( , P V{p) = V{P). 

The effect of type substitutions on the free and dangerous variables is now captured in the 
following proposition: 

Proposition 3.8 Let ip be a type substitution. For any T, where T could be a type t, a closure 
type it, a region p, or a type scheme a, we have: 

n<p{T)) = HviHT))) 

HV{V{T))) C V{<p{T)) C %(D(T)))UD(^(f(T))) 

Proof: Both these relations follow directly from the definitions of T{T) and V(T) by a simul- 
taneous structural induction over the appropriate type object T. □ 

The first equation provides an exact relationship between the free variables of a type before 
and after applying a type substitution to it. On the other handrthe second pair of inequalities 
provide only an approximation to the set of dangerous variables of a type after applying a 
type substitution to it. This is so because the substitution images of dangerous variables of a 
type (J^ r (( / o(X'(r)))) may not cover all the dangerous variables of the substituted type (X'(( / o(r))). 
Some non-dangerous variable may get substituted with a type containing dangerous variables 
that must also be counted as dangerous in the final type. 

Typing Rules 

Figure 3.2 shows the axioms and the inference rules for establishing elaboration judgments 
E \- a : t. The CONST and the PRIMAPP rules establish the elaboration judgment for a constant 
or a primitive operator application according to a predefined relation typeof that provides the 
type scheme associated with them. All such predefined type schemes are fully-quantified: there 
are no free variables in these type schemes. Most constants and operators have the obvious type 
schemes. We only show the predefined type schemes of the three reference operators below: 

typeof (ref ) = Vi, u, r. t — (u)— > t ref(r) 

typeof (I m%t able) = Vi, u, r. t ref(r) -(w)-> t 

typeof '(! non-mutabk) = Vt, u. t ref (e) -(u)-^t 

typeof (:=) = \/t,u,r. (t ref (r),t) —(u)-^ unit 



64 



typeof(c) > t 

const: j \ > - 



primapp: 



ident: 



Ehc-.r 

typeof (op) > T\ — (vr)— > r 2 Eh a : t\ 

E h op(a) : r 2 

x £ Dom(E) E(x) > t 
Eh x :t 





Eh 


a\ a 2 


: : t 2 




Ehfll 


■ n 




Eh a n \T n 




Eh 


■ «!,... 


} (^n ■ 


■Tl,...,T n 




E h a\ : T\ 


E+{ 


X \— > 


Gen(E, Ti)} h a^ : 


T2 


Eh 


(let x 


= (l\ 


in a 2 ) : r 2 




E h a : t ref(r) 


r 


?(T(E)UT(t)) 





{yi • • • y«} = F{f where /(x) = a) 
ABS: E+ {/ h^ ri -(-g(yi), • • .,E(y n ),7r)-^T 2 ,x ^ T t } h a : r 2 

^h (/where /(x) = a) : ri -(E(yi), . . .,E(y n ),n)-+T2 

E h a\ : T\ — (vr)— t- r 2 i? h a 2 : T\ 



tuple: 



let: 



E h (close a) : r ref(e) 
Figure 3.2: The Static Semantics of the Kernel Expression Language. 

There are two different types for the dereference operator (!)Tone for mutable references 
and the other for non-mutable references. This is because we consider mutable reference types 
as distinct from non-mutable reference types. EssentiallyFwe overload the use of the dereference 
operation with these two types. This does not create any problem since the exact type to be 
used is always clear from context. Moreoverrin our kernel languagerthe underlying dynamic 
dereferencing operation is the same in both cases. 

The IDENT rule instantiates the type scheme of an identifier stored in the type environment. 
The ABS rule shows how closure types are created in this system. The type schemes of all the 
free identifiers of the function are stored in its closure type. This is necessary to keep track 
of the mutable locations accessible through the closure environment. The APP and the TUPLE 
rules are self explanatory. The APP rule also handles primitive operator applications. 

The LET rule allows a type to be quantified and added to the type environment as a type 
scheme. The GENERALIZATION operation in the LET rule is defined as follows: 

Gen(E, r) = Vcui . . . a n . t where {a\ . . . a n } = J-(t) \V(t) \ J-(E) 

Finallyrthe CLOSE rule converts a mutable reference type into a non-mutable reference type 
by erasing its region variable and replacing it with the null region (e) . This is an explicit type 
conversion operation on the mutable reference type. The side condition ensures the soundness of 
this operation by checking that the region being closed does not escape from the current scope. 



65 



This is the exact formalization of the informal closing strategies described in Section 2.4. 

3.2.3 Properties of the Typing Rules 

In this sectionrwe will present some syntactic properties of the typing rules presented above. 
The most important property is the following proposition that states that typing is stable under 
type substitution. This property is essential for performing type inference (Section 3.4) because 
it guarantees that all incremental type refinements (via type substitutions) to a given typing 
of an expression yield legal typings of that expression. Thusrthe typing of an expression can 
be automatically refined to match that of its enclosing context. 

Proposition 3.9 (Stability under Type Substitution) Let a be an expression, t be a type, 
E be a type environment, and ip be a substitution. If E h a : t, then <~p(E) h a : <-p(t). 

Proof: by structural induction over a. For completenessFwe show all the cases. 
Case 1: a is c — The CONST rule applies: 

typeof(c) > t 



Ehc-.r 

Let typeof(c) = Vai . . .a n . To and ip be its instantiation substitution such that r = iP(tq) 

with Dom(ip) C {a\ . . .a n }. After renaming if necessaryr assume that «,• are out of reach 

for ip. Now define a substitution ip' with domain {a\ . . .a n } such that ip'(cni) = ^(^(a;)). 
Since the type scheme typeof(c) is assumed to be fully-quantifiedr there are no free 

variables in To other than the oi{. Thus iP'(tq) = Lp(ip(To)) = (^(^Fwhich implies that 

typeof(c) > (p(r). The desired result follows using the CONST rule. 
Case 2: a is op (a) — We proceed exactly like the previous case to show that typeof (op) > 

<~p(Ti — {it)— >T2). The desired result follows from the induction hypothesis on the second 

antecedent. 
Case 3: a is a; — The IDENT rule applies: 

x £ Dom(E) E(x) > t 
Eh x :t 

Let E(x) = Vai . . .a n . To and ip be its instantiation substitution such that r = iP(tq) with 
Dom(ip) C {a\ . . . a n }. After renaming if necessaryF assume that oi{ are out of reach for (pT 
so that Lp(E(x)) = Vai . . .a n . ^(to). Now define a substitution ip' with domain {a\ . . .a n } 
such that ip'(cni) = Lp(ip(ai)). We haveF 

ip'((p(ai)) = ip'(cni) = Lp(ip(ai)) \/i, since oi{ are out of reach of ip 

iP'( V ({l)) = V ((i) = V (iP((i)) V/3^a t - 

Thus iP'(lp(tq)) = Lp(ip(To)) = (,o(r)Fwhich implies that Lp(E(x)) > (p(r). This allows us to 

conclude <~p(E) h x : <~p(t) as desired. 
Case 4: a is (/ where f(x) = a)T(ai, . . . , a n )For (a\ 02) — All these cases follow immediately 

using the induction hypothesis on their respective antecedents. 
Case 5: a is (let x = a\ in 02) — The typing derivation ends in the LET rule: 

E \- a\ : T\ E + {x \— > Gen(E, ri)} h a^ : Ti 



E \- let x = ai in a^ : t^ 



66 



By definition of generalization we haver 

Gen(E,Ti) =Ma\ . . .a n . T\ and {ai . . .a n } = J 7 (ti)\V(ti) \T(E) (3.19) 

Let fa . . . (3 n be new variables that are out of reach of ip and are not free in E. Define a 
new substitution ip' = ip o {oi{ i— > fa}. Using induction hypothesis we haver 

<p'(E) h ai : ^'(n) (3.20) 

ip(E) + {x^ip(Gen(E,T 1 ))} h a 2 : v?(r 2 ) (3.21) 

Since no oi{ is free in ETwe have (f'(E) = (f(E). ThereforeLin order to apply the LET rule 
to the induction judgments 3.20 and 3.21 we need to show the following: 

V (Gen(E, n)) = Gen(<p'(E), ^'(n)) (3-22) 

We show this in two steps. Define V = T(lp'(ti)) \ V(lp'(ti)) \ T(y'(E)). 

SubCase 5.1: {fa . . .fan} CV — We follow the definition of V given above. We haver 

1. fa G J-(<~p' (t\)) — From Proposition 3.8 we obtain !F(<p'(ti)) = T '(<~p' '(J- '(ri)))Land for 
o-i G F{t\) we have T((p'(a>i)) = T(fa) = fa. 

2. fa G - D(<ycj'(ri)) — From Proposition 3.8 we obtain 2>(<yc/(ri)) C F (y' (V (t x ))) U 
D(<yCj'(J r (ri))). Now we haver 

• fa G - T{lp'{V{ti))) — From Equation 3.19 «,• G - V{ti) and for all a / a % Y fa G - 
J-((p'(a)) since fa are chosen to be out of reach of (p. 

• fa G - V{lp'{T{ti))) — By definition V(<p'(a>i)) = V(fa) = <j) and for all a / a % Y 

fa?V(tp'(a)). 

3. fa G - T(<f?(E)) — From Proposition 3.8 we obtain T(<p'(E)) = I \tp ' (I \E))) . Now 
from Equation 3.19 ot{ G - J-(E) and for all a / onYfa G - !F(<p'(ot)). 

SubCase 5.2: V C {fa . . .fa n } — Suppose we have a (3 G J-((p'(ri)) such that (3 / /3 8 -. We 
wish to show that (3 G - V . 

From Proposition 3.8 we obtain (3 G J- (<p' (J- (t\))) . Let a G ^"(ti) be such that 
(3 G J-(<~p'(a)). Now a / ajjTotherwise (3 = ^"(^'(ojj-)) = J- (fa) = /3;. Using Equation 3.19 
we must have one of the following situations: 

1. a G V(ti) — This implies that (i G .F(<yc/(2>("ri))) =>• (i G 2>(<yc/(ri)) using Proposi- 
tion 3.8. It follows from the definition of V that (3 G - V. 

2. a G ^(E 1 ) — This implies that [3 G T(tp'(F(E))) => [3 G T(<p'(E)) using Proposi- 
tion 3.8. AgainLit follows from the definition of V that /3 G - V. 

Combining the above two cases we obtain V = {fa . . -fan}- Now we haveL 

Gen((p'(E), <p'(ti)) = \/fa . . . (3 n . (p'(ri) by definition of generalization 

= (p(\/ai . . .a n .T\) by substitution over type schemes 
= (piGeniE,^)) 

This is the desired result of Equation 3.22Lso the LET rule can now be applied on the 
induction hypotheses 3.20 and 3.21. 
Case 6: a is (close a) — The typing derivation ends in the CLOSE rule: 

Eha:Tref(r) r G - (T(E) U T(t)) 



E \- close a : r ref(e) 



67 



Just as in the last caseHet r' be a new region variable out of reach of ip and not free in E 
or t. Define a new substitution ^' = ^o{ri-> r'}. Now we haver 

<p'(E) \- a : ip'{r ref(r)) by induction hypothesis 

=> <p(E) h a : (<p(t)) ref{r') since r £ (^(E 1 ) U ^(r)) (3.23) 

It is also clear that r' £ (T(y'(E)) U ^"(^'(r))) since r g (T{E) U ^(r)) and r' was 
chosen to be out of reach of (p. ThereforeFwe can apply the CLOSE rule to the induction 
hypothesis 3.23 to obtain the desired result. 

□ 

The following proposition states that a typing remains valid under a more general typing 
environment. 

Proposition 3.10 Let a be an expression, t be a type, and E, E' be two typing environments 
such that Dom(E) = Dom(E'), and E'(x) > E(x) for all x free in a. If E h a : t, then 
E'\-a\T. 

Proof: by simple structural induction over a. The base case for the IDENT rule follows directly 
from hypothesisrsince E'(x) > E(x) > t. That isFany instance of E(x) is also an instance 
of E'(x). For the LET and CLOSE rulesFwe observe that T(E') C T(E). In the LET ruler 
this implies that Gen(E',Ti) > Gen(E,Ti) and the result follows by applying the induction 
hypothesis to the second antecedent. For the CLOSE rulerthis implies that r ^ J-(E') and 
the result follows directly. □ 

3.3 Type Soundness 

3.3.1 Semantic Model 

In order to show the soundness of the typing judgments generated by the above type system 
with respect to its evaluation rulesrfirstrwe must precisely characterize a "consistent" semantic 
relationship between value-domain entities and their corresponding type-domain entities. Since 
values may contain reference locations from the storeFwe need to define STORE TYPINGS which 
are finite mappings from store locations to types: 

store typings: S ::= {!i4ri,...,I n 4r n } 

Note that we do not allow type schemes in store typings. This clearly separates the modeling 
of type generalization which is handled entirely via type environmentsrfrom the modeling of 
closing a mutable object which is handled entirely via type conversion within the store typing. 
Two store typings may be related by extension: 

Definition 3.11 A store typing S' extends another store typing S if Dom(S) C Dom(S') and 
S(l) = S'(l) for allle Dom(S). 

NowFwe define the following consistency relationships between value-domain entities and 
type-domain entities: 

Definition 3.12 (Semantic Model) Let s be a store, S be a store typing, e be an environ- 
ment, E be a type environment, v be a value, t be a type, and a be a type scheme. Define the 
following relations: 



68 



Case 1: S \= v : r — The value v belongs to the type t under the store typing S . The various 
cases are as follows: 
SubCase 1.1: S \= c : typeof(c), where typeof is a predefined relation between predefined 

constants and their types. 
SubCase 1.2: S \= (n-tup v\, . . .,v n ) : (ti, . . -,T n ), if for all i, S \= V{ : r 8 -. 
SubCase 1.3: S \= (clsr f,x,a,e) : T\ —(tt)-^T2, if there exists a type environment E such 

that S \= e : E and E h (/ where f(x) = a) : T\ — (7t)— >T2. 
SubCase 1.4: S \= I : r ref(r), if I £ Dom(S) and S(l) = r ref(r). 
SubCase 1.5: S \= I : r ref(e), if I £ Dom(S) and there exists a substitution ip with 

Dom(ip) C F(S(l))\V(S(l)) such that <p(S(l)) = t ref(e). 

Case 2: S \= v : a — The value v belongs to the type scheme a = V«i . . .a n .T under the 
store typing S, if none of oi{ belong to V(t) and if S \= v : <~p{t) for all substitutions ip with 
Dom{ip) C {a\ . . .a n }. 

Case 3: S \= e : E — The values contained in the environment e belong to the corresponding 
type schemes in the type environment E (pointwise) under the store typing S, if Dom(e) = 
Dom(E) and for all x £ Dom(E) we have S \= e(x) : E(x). 

Case 4 : \= s : S — The values contained in the store s belong to the corresponding types in the 
store typing S (pointwise), if Dom(s) = Dom(S) and for all I £ Dom(S) we have, 
SubCase J h l: If S(l) = r ref(r) then s(l) = v , rw and S \=v :t. 
SubCase J h 2: If S(l) = r ref(e) then s(l) = v , ro and S \= v : r. 

The primary motivation of "closing" a mutable object is to be able to generalize its type to 
a type scheme and use it like any other functional value in a safe manner. This is modeled in 
Case 1.5 by defining a closed location to be consistent with any type obtained via a substitution 
over the non-dangerous variables of the type present in the store typing. On the other hand 
in Case 1.4Ta mutable location is defined to be consistent only with the exact type present 
in the store typingr modeling the fact that it is allowed to have only a monomorphic type. 
The one-to-one correspondence between dynamic mutability of a reference location and its type 
is reflected in Cases 4.1 and 4.2. Only the locations with a read/write tag are defined to be 
consistent with a mutable reference type and vice-versa. 

3.3.2 Properties of the Semantic Model 

During the course of evaluation of a programrthe values contained within the store locations 
may change but the types of those locations remain the same (except for the types of locations 
that are currently being closed). This fact is useful in showing that a semantic relation such as 
S \= v : t that holds true at some point during evaluationr remains true afterwards under any 
extension of the current store typing: 

Proposition 3.13 (Store Typing Extension) If S' extends S, then S \= v : r implies S' \= 
v : t. Similarly, S \= e : E implies S' \= e : E . 

Proof: by a simple induction over v. The only interesting case is that for locations. The 
definition of extension ensures that S and S' must agree exactly on the types of the locations 
that are present in S. □ 



69 



3.3.3 Type Soundness 

Before we establish the consistency of the static and the dynamic semantics in terms of a 
soundness theoremrit is useful to characterize the semantic meaning of the generalization and 
the closing operations in terms of the above semantic definitions. 

The following proposition establishes the fact that it is semantically safe to generalize the 
non-dangerous variables of a type. 

Proposition 3.14 (Semantic Generalization) Let v be a value, t be a type and S be a store 
typing such that S \= v : r. Let a\, . . . , a m be type variables such that for all i, oi{ ^V{t). Then, 
for all substitutions ip with Dom(Lp) C {a\ . . .a m }, we have S \= v : ^(t). As a consequence, 
S \= v : V«i . . .a m .T. 

Proof: by structural induction over v. Only the case for a closed location is different from 

[Ler92]Tbut we show all cases for the sake of completeness. 

Case 1: v is c — By definition! 1 S* \= c : typeof(c) and therefore we must have typeof(c) > t 
using the hypothesis S \= c : r. Also by assumptionFall predefined constants possess fully 
quantified type schemesri.e.rtheir type schemes do not contain any free type variables. 
This implies typeof(c) > <~p{t) and the result S \= c : <~p{t) follows immediately. 

Case 2: v is (n-tup v t ,...,v n ) and r is T t ,...,T n — Since V(t Xi . . . , T n ) = Ui<j<n ^( r j) r 
we must have for all i,j that oi{ (j£ Tj. By induction hypothesis it follows that for all jT 
S \= v j : <p(tj). The result follows from the definition of \= for tuples. 

Case 3: v is (clsr /, x, a, e) and r is T\ — (vr)— > Ti — Applying the definition of \= for closuresr 
let E be the type environment such thatr 

S \= e : E (3.24) 

E \- (/ where f(x) = a) : n -(tt)-^t 2 (3.25) 

We will show thatr 

S \= e : <p(E) (3.26) 

<p{E) h (/ where f(x) = a) : ^(n -(tt^ r 2 ) (3.27) 

Equation 3.27 follows directly from Equation 3.25 using Proposition 3.9 that typing is 
stable under substitution. Also note that Dom(E) = Dom(e) = T{f where f(x) = a) 
from Equation 3.24 and the dynamic ABS rule in Figure 3.1. 

In order to show Equation 3.26rwe must show S \= e(y) : Lp(E(y)) for all y £ Dom(E). 
For a given yTlet E(y) = V/?i . . ./?&. r' where (3j are taken out of reach of ip and distinct 
from oi{. Using substitution over type schemesFwe obtain Lp(E(y)) = V/3i . . . /3fc . <-p{t'). 
Thusrin order to conclude S \= e(y) : Lp(E(y))I 'first we have to show S \= e(y) : iJj(lp(t')) 
for any substitution ip with Dom(ip) C {/3i . . ./?&}. This is done as follows. 

From Equation 3.24 we obtain S \= e(y) : E(y) which implies S \= e(y) : t' using the 
definition of |= over type schemes. Now consider the substitution ip o ip. Its domain is 
{«i, . . . , a n , /?i,..., (3k}. We claim that none of these variables are dangerous in r': 

• oti (j£ T>(t') — We know that y G T{f where f(x) = a)Tso its type scheme E(y) 
is included in the closure type it of r = T\ — (vr)— >T2- This implies that V(E(y)) = 
T>(t') \ {(3\ . . . (3k} is included in V(t) = T>(ir). Since oi{ (j£ V(t) by hypothesisr it 
follows that oti (j£ T>(t') for all i. 



70 



• (3j g - T>(t') — S |= e(y) : E(y) from Equation 3.24 immediately implies (3j g - T>(t') for 
all j. 

Now we can apply the induction hypothesis to the value e(y)rthe type rTthe variables 
«i, . . . , a n , /3i, . . . , (3k and the substitution ip o ip to obtain S \= e(y) : iP(lp(t')). This 
holds for any substitution ip over {(3\ . . .fik}- MoreoverFnone of fij are dangerous in <~p{t') 
since they are not dangerous in t' and they are out of reach of ip. Therefore we obtain 
S \= e(y) : V/?i . . .fik- T ' by definition of |= over type schemesrthat isTS |= e(y) : Lp(E(y)). 
This holds for all y £ Dom(E). Hence Equation 3.26 is satisfied and we obtain the desired 
result. 

Case 4: v is / and r is T\ ref(r) — HererX'(r) = J-(t). Since no «,• is dangerous in r by 
hypothesisrit follows that no oi{ can be free in r. ThusFt^r) = r and the result follows 
immediately from the hypothesis that S |= v : r. 

Case 5: u is / and r is ri ref(e) — Applying the definition of |= for non-mutable locationsr 
let ip be the substitution with Dom(ip) C J r (S(l))\V(S(l)) such that r = ip(S(l)) thereby 
implying <~p{t) = Lp(ip(S(l))). AlsoFno oi{ G Dom(Lp) is dangerous in S*(/). Otherwiser 
it would surely be dangerous in r = ip(S(l)) from Proposition 3.8 which contradicts the 
hypothesis. 

Consider the substitution cpoip restricted to the domain X = J r (S(l))\V(S(l)). From 
the above remarks it is clear that we still have (cp o ip) \x (S(l)) = ¥ , ( r )- ThusFwe can 
apply the definition of \= for the location / using the substitution (cp o ip) \x to conclude 
S \= I : ¥ , ( r ) as desired. 

□ 

The following proposition establishes a correspondence between the dangerous regions of 
a type and the mutable locations that are reachable from a value possessing that type. This 
allows us to use dangerous regions as a safe static abstraction for mutable locations. 

Proposition 3.15 (Region Abstraction) Let s be a store, and S be a store typing such that 
\= s : S . Then we have, 

s^v.r ^ I (J n(s(i))\ cn(r) 

\l(zReachable(v,s) J 

That is, the dangerous regions contained in the types of reachable locations of a value are 
dangerous in the type of the value. Using pointwise extension to environments we also have, 

S^e:E^i \J U(S(l))\ CU(E) 

\l(zReachable(e,s) / 

Proof: by induction on the depth of reachability of a location / in the value v. Firstrwe define 
a family of reachability functions Reachable 1 ^, s) as follows: 

Reachable (v,s) = C{v) (3.28) 

Reachable l+1 (v lS ) = Reachable 1 (v , s) \J \ [j C{value(s(l)))\ (3.29) 

\/G Reach able (v,s) 



71 



By deHmtionF Reachable (v , s) is the limit of the increasing chain of sets Reachable (v , s) C 
Reachable 1 (v , s) C • • •. Since the number of locations reachable from a value is finiterthis 
chain is guaranteed to reach the limit at a finite i. We will show that for all iT 

U nS(l))\ QK(t) (3.30) 

\ltzReachable (v,s) / 

Base Case: Using Equation 3.28rwe need to show that for all locations / £ C(v)Twe have 

^-('S'(O) — ^( r )- This is shown by induction on the structure of v using the definition of 

S \= v : t. 

Case 1: v is c — Triviairsince there are no locations reachable from a constant. 

Case 2: v is (n-tup v\, . . . , v n ) and r is T\, . . . , T n — Follows immediately from the defini- 
tion of \= for tuples and the induction hypothesis for each V{. 

Case 3: v is (clsr /, x, a, e) and r is T\ —(tt)-^T2 — From the definition of |= for closures 
we obtain that there exists a type environment E such thatr 

S \= e : E and E h (/ where f(x) = a) : n -(vr^ r 2 (3.31) 

Applying the definition of reachability for (clsr f,x,a,e) and the induction hypothesis 
for environments we obtain: 

( U n(s(i))) = ( U n(s(i)))cn(E) (3.32) 

The desired result follows by noticing that 1Z(E) C 1Z(t) since Dom(E) = Dom(e) = 

.F(/ where f(x) = a) and all the type schemes in CoDom(E) are included in the closure 

type of t by construction. 
Case 4: v is / and r is ri ref(r) — Follows immediately from the definition of |= for 

mutable locations since S(l) = r. 
Case 5: v is I and r is T\ ref(e) — From the definition of |= for non-mutable locations 

we have Lp(S(l)) = r. Butrthe domain of ip does not include any dangerous variables of 

S'(/)rso we must have 1Z(S(1)) C lZ((p(S (I))) = 1Z(t) as desired. 
Induction Case: We assume the hypothesis for iT 

U nS(l)) C K{t) (3.33) 

\ltzReachable (v,s) / 

From Equation 3.29rthe locations in Reachable' (v , s) are already covered via the above 
hypothesis. Given a location / £ Reachable' (v , s)Tlet value(s(l)) = u' and S*(/) = r' ref(p). 
Using hypothesis |= s : ST we have S \= v' : r'. ThereforeFwe can apply the base case in- 
duction as above and obtain for all /' £ C(v')TlZ(S (I')) C IZ(t'). This immediately extends 
to 1Z(S(1')) C 7^.(r)rsince r' is contained in S*(/) and 1Z(S(1)) C 7^(r) from Equation 3.33. 

□ 

The semantic consistency between the static and the dynamic semantics can now be stated 
in the form of the soundness theorem given below. It is proved using induction on the size of 
evaluation derivationrdoing a case analysis on a and hence on the last rule used in the typing 
derivation. 



72 



The soundness of the close operation relies on the fact that it only closes fresh and 
non-escaping locationsr i.e. Y locations that are neither present in the initial storer nor are 
accessible from the current environment or the returned result. The former is a property of the 
dynamic rules (Proposition 3.5) and the latter is ensured by the side condition on the static 
CLOSE rule and Proposition 3.15. 

Theorem 3.16 (Type Soundness) Let a be an expression, t be a type, E be a type envi- 
ronment, e be an evaluation environment, s be an initial store, and S be a store typing such 
that: 

E\- a : t and S |= e : E and |= s : S 

If there exists a result r such that e h a/s =? r, then r / err, instead r = v/s' for some value 
v and a resulting store s' , and there exists a store typing S' such that: 

S' extends S and S' |= v : r and |= s' : S' 

Proof: by induction on the size of evaluation derivation. We argue by case analysis on a and 
hence on the last rule used in the typing derivation. AgainFonly the case for the CLOSE rule 
is different from [Ler92]Tbut we show all cases. 
Case 1: Constants — The typing rule is: 

typeof{c) > t 

Ehc-.r 

The only possible evaluation is e h c/s =>■ c/s. By definition of |= for constantsFwe have 
S \= c : typeof(c) which implies S \= c : r since typeof(c) > t. We conclude with S' = S. 
Case 2: Variables — The typing rule is: 

x £ Dom(E) E(x) > t 
Eh x :t 

From hypothesis S \= e : E it follows that x £ Dom(e) and S \= e(x) : E(x). Thusr 
the only possible evaluation is e h x/s =^ e(x)/s. By definition of \= for type schemesr 
S \= e(x) : E(x) implies S \= e(x) : r. We conclude with S' = S. 
Case 3: Function Abstraction — The typing rule is: 

{yi ■■■y n } = F(f where f(x) = a) 

E+{f ^ n ~{E(y 1 ), . ..,E(y n ),K)-^T2,x ^ n} h a : r 2 

Eh (f where f(x) = a) : n -(E{yi), . . ., E(y n ), tt)-^T2 

The only possible evaluation is e h (/ where f(x) = a)/s =?- (clsr f,x,a,e \y)/ s where 
Y = {yi . . .y n }. Using the definition of |= for closuresFwe have S |= (clsr f,x,a,e \y) '■ 
T\ — (vr)— 7-T2 taking E \y to be the desired type environment. We conclude with S' = S. 
Case 4: Function Application — The typing rule is: 

E \~ a\ : T\ — (vr)— > Ti E \- a^: T\ 



E\- a\ <22 : T2 

We claim that evaluations leading to err are not possible and that the following evaluation 
rule applies: 

e\- a\/ s =?- (clsr /, x, ao, cq) / s\ 

e \- a 2 /si => v 2 /s 2 

ep + {/ !-> (clsr /, x, a , e ), x i-> v 2 } h a /s 2 => v/s' 

e \- (a i a 2 )/s =>■ v/s' 



73 



This is shown as follows: 

Using the induction hypothesis on aiPwe obtain that it cannot evaluate to eriTinstead 
it must evaluate to a closured h a\/s =>■ (clsr /, x, ao, eo)/si with a store typing Si such 
that: 

Si |= (clsr /, x, a , e ) : T\ —(tt)-^t 2 and |= s\ : Si and Si extends S (3.34) 

Since Si extends Srwe have Si \= e : E from Proposition 3.13. ThusFwe can use the 
induction hypothesis on a 2 with store s\ : Si and obtain that it evaluates to a proper value 
as welire h a 2 j s\ =>■ v ij s i with a store typing Si such that: 

S 2 \= v 2 : T\ and |= s 2 : S2 and S2 extends Si (3.35) 

Applying the definition of |= to the first clause in Equation 3.34rwe obtain that there 
exists a type environment Eq such that: 

Si |= e : E (3.36) 

and Eq \~ (/ where f(x) = ao) : T\ —(tt)-^t 2 

=> Eo + if^n -(*)-> t 2i x ^ n} h a : r 2 (3.37) 

Now consider the following environments: 

e2 = e + {/ i-> (clsr /, x,a ,e ),x ^ v 2 } and E 2 = E + {f ^ T t -(7r)-^T2, x ^ T t } 

Using Proposition 3.13 and Equations 3.34r 3.35rand 3.36rwe obtain S 2 \= e 2 : £^2- 
Thereforerwe can apply the induction hypothesis to the typing judgment 3.37 and the 
store S2 : S2. We obtain the evaluation e 2 h cio/s 2 =>■ v/s' with a store typing S' such that: 

S' \= v : t 2 and \= s' : S' and S' extends S2 (3.38) 

This shows that ao in the third premise of the evaluation rule given above also evaluates 
to a proper value and we obtain the desired result since S' extends S by transitivity. 

Case 5: Tuple Construction — Same argument as above. 

Case 6: let-binding — The typing rule is: 

E \~ a\ : T\ E + {x \— > Gen(E, ri)} h a 2 : t 2 
E \- let x = a\ in a 2 : t 2 

Againr we claim that evaluations leading to err are not possible and the last step in 
evaluation derivation is: 

e h a\/s =>■ v\/s\ e + {x \— > v\} h a 2 /s\ =>■ v 2 /s' 



e \- (let x = a\ in a 2 )/s =>■ f2/ s ' 

This is shown as follows: 

Using the induction hypothesis on aiTwe obtain that it does not evaluate to errPinstead 
e h ai/s =>■ vi/si with the store typing Si such that: 

Si |= v 1 : ri and |= si : Si and Si extends S (3.39) 

Using Proposition 3.14rwe have S\ \= v\ : Gen(E,Ti) since the Gen operator does not 
generalize any dangerous variables in T\. NowFconsider the following environments: 

e\ = e + {x \— > v\} and E\ = E + {x \— > Gen(E, ri)} 



74 



Since Si extends Srwe obtain S\ \= e\ : E\. ThereforeFwe can apply the induction 
hypothesis to the second premise of the typing rule and the store s\ : Si to obtain e\ h 
a,2 1 s\ =>■ vij s' with the store typing S' such that: 

S' |= V2 : T2 and \= s' : S' and S' extends Si (3.40) 

This is the desired result. 
Case 7: Reference Creation — The PRIMAPP typing rule instantiates to: 

Vi, u, r. t — (u)— > t ref(r) > t — (n)— > t ref(r) E h a : r 
E \- ref(a) : r ref(r) 

The evaluation must end up with: 

e \- a/s =>■ v/s\ I (j£ Dom(si) 



e h ref (a)/s =^ l/(si + {/ H> v, rw}) 
By induction hypothesis applied to cTwe obtain a store typing Si such that: 

Si \= v : t and |= si : Si and Si extends S (3-41) 

Let us definer 

s' = si + {/ h-> f , rw} and S' = S\ + {/ h-> r re/(r)} 

Since Dom(si) = _Dom(Si)rwe have / ^ Dom(Si). HencerS' extends Si and therefore S. 

Using this fact on the first clause of Equation 3.41Twe obtain S' \= v : rFwhich allows us 

to conclude from the definition of |= that S' \= I : (r ref(r)) and |= s' : S'. 

Case 8: Dereferencing — We show the case for dereferencing a non-mutable location. The 

case of dereferencing a mutable location is similar. The PRIMAPP typing rule instantiates 

to: 

Vi, u. t re/(e) — (u)— > t > t re/(e) — (vr)— > r E \- a : t re/(e) 

Bh !a :r 

By induction hypothesis applied to «rwe obtain that it must evaluate to a location e h 
a/s =^> //si with a store typing Si such that: 

Si\= I : t re/(e) and |= si : Si and Si extends S (3.42) 

AlsoI7 G Dom(si) because the first clause above implies that / £ Dom(Si) and Dom(si) = 
Dom(Si) from the second clause. Thusrthe only possibility for evaluation is: 

e \- a/s =>■ //si / G Dom(si) value(si(l)) = u 



e h ! a/s =>■ u/si 

Applying the definition of |= for non-mutable locations to the first clause in Equation 3.42r 
we obtain that there exists a substitution ip with Dom(ip) C J^(Si(/)) \X>(Si(/)) such that 
ip(Si(l)) = t ref(e). ThusrSi(/) must be of the form: 

Si(/) = r're/(e) with iJ){t') = t 

This is because all locations must have reference types and we never substitute the null 
region for region variables. From the definition of |= s\ : Si for location / it follows that 
S\\= v : t' . Since Dom(ip) does not include any dangerous variables in S(/) and hence in 
rTwe can apply Proposition 3.14 to substitution ip and obtain S\ \= v : iP(t'). This is the 
desired result taking S 1 = S\. 



75 



Case 9: Assignment — The PRIMAPP typing rule instantiates to: 

Vi, u, r. (t ref(r), t) — (u)— > unit > (r ref(r), r) — (tt)— > unit E\- a : r ref(r), t 

E\- : = (a) : unit 

As in the previous caserthe evaluation must end up with: 

e \- a/s =>■ (/, v)/s\ I G Dom(si) tag{s\ (/)) = rw 



e h :={a)/s =>• ()/(si + {/ h-> u, rw}) 
By induction hypothesis applied to cTwe get a store typing Si such that: 

Si |= (/, v) : t ref(r), t and |= si : Si and Si extends S (3.43) 

This implies Si \= v : t and Si (I) = r ref(r) from the definition of |= for tuples and 
mutable locations. Letting S' = Si and s' = si +{/ h- > v, rw}Twe therefore obtain \= s' : S' 
using the definition of |=. FinallyFwe check that S' \= : unit and obtain the desired 
result. 
Case 10: close expression — The typing rule is: 

Eha:Tref(r) r £ (T(E) U T{t)) 

E \- close a : r re/(e) 

Using the induction hypothesis on aFwe obtain e h a/s =>■ l/s\ with the store typing Si 
such that: 

Si\= I : t ref(r) and |= s i : Si and Si extends S (3.44) 

From the first two clauses above and the definition of |= for mutable locationsFwe obtainr 

/ G Dom(Si) Si(l) = t ref(r) «i(0 = v, rw and Si \= v : t (3.45) 

Thusrthe CLOSE evaluation rule applies: 

e \- a/s =>■ l/s\ s i(0 = v i rw 

L = Reachable (I, s\) U Reachable [e^ s\) U \JiifDom(.s) R eac hable(V ', si) 

e \- (close a)/s =>■ l/(s\ \l +{/ !— > v, ro}) 

Let us now definer 

s' = si | L +{/ h^ u, ro} and S' = Si | L +{/ ^ t ref(e)} (3.46) 

NowFwe have to show the following: 

S' \=l :t re/(e) and |= s' : S' and S' extends S (3.47) 

The first clause follows directly from the definition of |= for non-mutable locations since 
we have chosen / G Dom(S') and S'(l) = r re/(e). 

Nextrwe show that S' extends S. Note that Si extends S from Equation 3.44 and 
Dom(S) C Dom(S') by construction. ThereforeFS' will extend S if / G - -Dom(S)rsince 
that is the only location at which Si and S' differ. This is shown as follows. 



76 



Suppose for the moment that / G Dom(S). Since |= s : S by hypothesisr we have 
/ G Dom(s). Applying Proposition 3.5 to the evaluation e h a/s =? l/s\ we conclude that 
/ G Reachable(e,s). Alsor since Si extends SLwe obtain that S(l) = 5*1(7) = r ref(r). 
Finallyrusing Proposition 3.15 for the hypothesis S \= e : ETwe must have r G 1Z(S(1)) C 
7Z(E) C J-(E) which contradicts the condition r G - J~(E) in the typing rule. 

As the final step in proving Equation 3.47Pwe have to show \= s' : S' . By constructionr 
we have Dom(s') = Dom(S') and at location ITs'(l) has the read-only tag which is consis- 
tent with S'(l) pointing to a null region. At locations /' G Dom(S') other than /rthe tags 
in s' are already consistent with the corresponding regions in S' since they are directly 
copied from si : Si. Nextrwe have to show that for all locations /' G Dom(S') such that 
valuers' (I')) = v' and S'(l') = r' ref(p)Twe haver 

Si \= v' : t' => S' \= v' : r' (3.48) 

This can be shown by a simple structural induction on v'. Only the case for locations 
is interesting. By constructionr the store s' is closed under reachability so there is no 
possibility of encountering undefined locations within v Tand for locations other than ITwe 
already have S'(l') = Si(l'). 

The only problem is if v' contained / (the location being closed) T then r' would still 
contain the region variable r because 5*1 (/) = r ref(r). But this region has been closed in 
S'Tmaking S' \= v ' : r' inconsistent. Thusr/ should not be contained in v'. This is where 
the domain restriction on the store s' proves useful. We show below a stronger condition 
that the location / is not reachable from any value v' present in the store s' . Specificallyr 
we will show that / (j£ Reachable [v 1 , si) which implies / (j£ Reachable [v 1 , s'). 

Let us assume for the purpose of contradiction that / G Reachable (v' , si) . Looking at 
the components of Dom(s') given by Equation 3.46Lthe following possibilities arise for 
/' G Dom(s'): 



1. /' = / — Then v' = v and hence / G Reachable (v,si) by assumption. NowL we 
apply Proposition 3.15 to Si \= v : t taken from Equation 3.45 to conclude that 
r G lZ(Si(l)) C 1Z(t) C J-(t) which contradicts the condition r ^ ^"(t) in the typing 
rule. 

2. V ^ I but /' G Reachable (I, si) — This immediately implies /' G Reachable (v,si) since 
the location / contains the value v in both si and s' . Together with the assumption 
/ G Reachable {v 1 ^ si) and transitivity of reachabilityL we obtain / G Reachable (v , si) 
which leads to a contradiction as shown in the previous case. 

3. /' G Reachable (e,si) — Using the assumption / G Reachable (v 1 ', si) and the fact that 
/' contains v' in siLwe obtain by transitivity that / G Reachable(e,si). Applying 
Proposition 3.15 to Si \= e : E derived from Equation 3.44 and Proposition 3.13Lwe 
conclude that r G lZ(Si(l)) C 1Z(E) C J-(E) which contradicts the condition r ^ J~{E) 
in the typing rule. 

4. /' G Reachable (D ] om (s) , si) — We know that / was not reachable from any value present 
in the domain s initiallyLi.e.L/ (j£ Reachable (D ] om (s) , s) because we have already shown 
that / (j£ Dom(s) while showing that S' extends S. ThusLthe only way / could be- 
come reachable from Dom(s) after the evaluation e h a/s =>■ v/s\ is if some location 
in Dom(s) was assigned a new value from which / was reachable. Without loss of 
generalityLlet us assume that location is /' and the newly assigned value is v'Ti.e.T 

3 I' G Dom(s) : value(s(l')) / value(si(l')) = v' and / G Reachable (v' ', si) (3.49) 



77 



Since location /' was modified during the evaluation e h a/s =? v/siTwe can apply 
Proposition 3.6 to conclude that /' £ Reachable(e,s). Applying Proposition 3.15 to 
hypothesis S \= e : ETwe obtain that 1Z(S(1')) C 1Z(E) which extends to lZ(Si(l')) C 
7Z(E) since Si extends S. 

On the other handrfrom Equation 3.48 we already have S\ \= v' : t' where S\{l r ) = 
t' ref(r') for some region variable r'. Applying Proposition 3.15 in this case for the 
location / £ Reachable(v' , s\) we obtain r £ lZ(Si(l)) C IZ(t') C 1Z(Si(1')). Combining 
this with the result obtained in the last paragraphrwe conclude r £ 7Z(E) C J-(E) 
which contradicts the condition r £" J-(E) in the typing rule. 

This proves that / is not contained in any value v' present in the store s' which implies 
that |= s' : S'. ThusFall the clauses of claim 3.47 are true and we have the desired result. 

□ 

The soundness theorem immediately leads us to the following corollary that guarantees that 
closed reference locations are never updated. 

Corollary 3.17 (Non-Mutability of Closed Locations) Let a be an expression fragment 
within a type correct program p such that E h (close a) : r ref(e) and e h (close a)/s =>■ l/s'. 
Then, the location I is never updated during the evaluation of the rest of the program. 

Proof: The dynamic CLOSE rule (Figure 3.1) ensures that tag(s'(l)) = ro. The ASSIGN rule 
requires a rw tag for the location to be updatedrand there is no other rule that converts the 
tag of a location from ro to rw. ThusFas long as the program p does not illegally attempt to 
update the location / and runs into a dynamic errorrthe location / cannot be modified. This 
condition is guaranteed by the soundness theorem since the program is well-typed. □ 

Corollary 3.17 may be generalized to arbitrary objects with a completely closed type. This 
allows us to conclude that mutable objectsFonce successfully closedrcan no longer be modified 
and therefore behave functionally. 

Corollary 3.18 (Non-Mutability of Closed Objects) Let a be an expression fragment within 
a type correct program p such that E h a : r where 1Z(t) = (f> and e h a/s =>■ v/s'. Then, no 
location I £ Reachable (v, s') is updated during the evaluation of the rest of the program. 

Proof: Using the soundness theorem we know that the evaluation of p (and hence a) does not 
lead to error and there exists a store typing S' \= v : r and |= s' : S'. We claim that for all 
locations / £ Reachable (v,s') we must have tag(s'(l)) = ro. Otherwiserfrom Definition 3.12 
Case 4 it follows that there exists a region variable r\ such that S'(l) = T\ ref(r\). Thenr 
using Proposition 3.15 it follows that r\ £ 7^.(r)rwhich contradicts the hypothesis 1Z(t) = (f>. 

NowFsound uses of the ASSIGN rule in Figure 3.1 require that the tag of the location being 
assigned should be rw. Furthermorerthere is no rule that converts the tag of a location from 
ro to rw. ThereforeFno assignments are possible on any location / £ Reachable (v,s') during 
the evaluation of the rest of the program. □ 

Note that Corollary 3.17 is not a special case of Corollary 3.18 because Corollary 3.17 
guarantees the non-mutability of a single closed location even if the locations reachable from 
within it are mutable. On the other handr Corollary 3.18 only deals with objects that have 
completely closed types in order to guarantee that none of the locations reachable from them 
are mutable. 



78 



3.4 Type Inference 

Finallyrour type system admits a type inference algorithm Infer that infers principal types for 
expressions. This algorithm is a direct extension of the one described in Leroy's thesis [Ler92] 
to region variables. We only need to ensure that region variables are allowed to be unified only 
with other region variables and never with the null region (e). This guarantees that we do not 
accidentally "close" a mutable reference type by unification. That operation should only be 
performed explicitly using the close construct. 

We will not discuss the details of the inference algorithm here since it is a trivial extension 
of that in [Ler92]. We only state the following propositions that characterize the soundness 
and the completeness of the inference algorithm with respect to the type system described in 
Section 3.2: 

Proposition 3.19 (Soundness of Type Inference) Let a be an expression and E be a type 
environment. If (r, p>) = Infer (a, E) is defined then we can derive <~p{E) h a : r. 

Proposition 3.20 (Completeness of Type Inference) Let a be an expression and E be a 
type environment. If there exists a type t' and a substitution p>' such that 'p'(E) h a : t' , 
then (r,(p) = Infer(a, E) is defined and there exists a substitution if) such that t' = ip(r) and 

(p 1 = if) o (f. 

The proof of these proposition follows exactly as described in [Ler92]. 



79 



80 



Chapter 4 

Closing Data-Structures 



So faiTwe have shown how to close a single mutable reference location. In this chapterFwe show 
how to extend the use of the close construct to complexFmulti-level data-structures involving 
tuplesrarraysrand general algebraic datatypes. Firstrwe discuss some alternatives for specifying 
the dynamic and static semantics of closing multiple locations and regions simultaneously in 
a multi-level data-structure. This leads us to devise a type-annotation based specification 
mechanism within the source language that permits the user to specify exactly which regions 
and their corresponding locations are to be closed. Nextrwe discuss the strategies for verifying 
the correctness of this scheme for arrays and general algebraic datatypes. We also briefly discuss 
how this work may be applied to conventional languages such as CrPascairor Fortran. Finallyr 
we present the summary of Part I and directions for future work based on this research. 

4.1 Specification of "Close" for Multi-Level Data-Structures 

The static and dynamic CLOSE rules shown in Chapter 3 (Figures 3.2 and 3.1 respectively) only 
apply to a single mutable location being returned as the only result from an expression. These 
rules clearly need to be extended for the diverse range of data-structures available in a modern 
programming language. Id offers tuplesr arraysr and general algebraic datatypes (including 
recursive datatypes) T any of which could be implemented in an imperative manner and may 
need to be closed. Furthermorerthe exact mutable locations to be closed may be embedded 
anywhere inside a complexFstructured result returned from a computation. ThereforeFwe need 
a systematic way of closing structured results which involves the following tasks: 

1. Given an expression that returns a structured resultrwe need to specify which locations 
to close in the dynamic semanticsFand the corresponding regions to close in the static 
semantics. 

2. We need to statically verify the soundness of the close operation by clearly identifying 
the scope of the imperative operations taking place on the locations being closed. 

As discussed in Section 2.3.1Ttreating the close construct as an encapsulator clearly delineates 
the scope of the imperative operations dynamically taking place on the returned result and it 
also statically identifies the type environment against which to verify the closing operation. In 
this sectionFwe discuss the first issue of specifying the semantics of closing multiple locations 
and regions simultaneously within a structured result. 



81 



4.1.1 Dynamic Semantics Issues 

A simple and natural way to extend the dynamic semantics of the close construct to multi- 
level data-structures is to take the "all-or-nothing" approach. That isr closing an arbitrary 
data-structure recursively closes all its subcomponents and failure to close any one of the sub- 
components results in the failure to close the entire data-structure. This generalized semantics 
may be expressed in the following dynamic rule for the close construct: 

e h a/s =? v/s\ 
DYNAMIC-CLOSE1: {l\ .. .l n } = Reachable (v, s\) si(/ 8 ) = f 8 , rw l<i<n 



e h (close a)/s =^ v/(s\ + {/; h- > Vi, ro}) 1 < i < n 

In the light of the remarks made in Section 2.4.2rwe have to be careful not to close locations 
that are reachable from the enclosing environment. OtherwiseFwe would be able to write a 
universal closing function such as the closeall function shown below that would incorrectly 
close arbitrary mutable objects that are still being used imperatively: 

Example 4.1: 

def closeall x = close x; 

a = ref 1; 

b = closeall a; 

a := 2; 7, Dynamic Write Error! 

Clearlyrsuch functions should be disallowed because they create spurious dynamic "write- 
errors'Ti.e.rwriting to a location that has been closed unintentionally. We would like to avoid 
such spurious errors or at least detect the possibility of creating such errors at the time of 
closing an object rather than at the time of using it. So we modify the DYNAMIC-CLOSE1 rule 
to reflect this strategy: 



DYNAMIC-CLOSE2: 



e \- a/s =? v / s\ 

{l\ . . .l n } = Reachable(v, s\) \ Reachable(e, s\) 

si(li) = Vi, rw 1 < i < n 

e h (close a)/s =^ v/(si + {/; h- > Vi, ro}) 1 < i < n 



The above rule simply excludes all the locations reachable from the environment from being 
closed. This makes the closeall function of Example 4.1 behave like the identity function 
since no external location can now be closed. AlternatelyFwe could introduce a side condition 
on the above rule to produce a dynamic "close-error" if any of the locations being closed was 
present in the environment. 

The above rule is still not entirely free of spurious write-errors. In the light of the remarks 
made in Section 2.4.4rwe should not close locations that are captured within a function closure 
because such locations may be modified by the function. The following example illustrates this 
scenario: 

Example 4.2: 
g = close { b = ref 1; 

def fx={b :=x; }; 
in f }; 

g 2; 7, Dynamic Write Error! 



82 



In the above exampleLan internal mutable location b is captured within a function closure f 
which is subsequently closed and returned. If the function body modifies the captured location 
(as it does here) then any application of the function would generate a spurious write-error. 
We can modify the DYNAMIC-CLOSE2 rule to omit closing such locations: 



DYNAMIC-CLOSE3: 



e h a/s =>■ v/s\ 

{l\ . . .In} = Closable(v, s\) \ Reachable {e, s\) 

s\(li) = u,-,rw 1 < i < n 

e h (close a)/s =>■ v/ (s\ + {/,- h- > Vi, ro}) 1 < i < n 



The closable locations of a value v with respect to a store si" written Closable(v , s) Tare 
defined to be all the reachable locations from the given value except those that are reachable 
via an embedded function closure. A simple way to compute this set would be to modify the 
algorithm Gather-Locations given in Section 3.1.2 to collect the locations reachable through 
a function closure at Line 8 in a separate set. This set would then be subtracted from the set 
of all reachable locations of a value to yield the set of closable locations of that value. 

The DYNAMIC-CLOSE3 rule given above seems fairly reasonable as far as the dynamic se- 
mantics of close is concerned for generalLmulti-level data-structures. 

4.1.2 Static Semantics Issues 

The static semantics for the DYNAMIC-CLOSE3 rule above could be given as follows: 

Eha-.r {r l ...r n } = C{T)\T{E) 



STATIC-CLOSEl: 



E \- (close a) : {r 8 - h- > e}r 1 < i < n 



The above rule erases only closable regions C(r) from the given type r which consists of 
the set of all dangerous region variables of the given type except those that occur within the 
closure type of a function. It also excludes all regions visible in the type environment. 

Although the rules DYNAMIC-CLOSE3 and STATIC-CLOSEl seem plausible at first glanceL 
unfortunately they cannot be shown to be sound with respect to each other. IntuitivelyLstatic 
semantics should provide a conservative approximation of what happens dynamically. As far 
as the close construct is concernedLthis intuition is captured in Proposition 3.15 where we 
always maintain a correspondence between the reachable locations of a value and the visible 
regions variables in its type. Any semantics we give to the close construct must respect 
this correspondenceLotherwise we will not be able to statically model the dynamics of closing 
an object properly. UnfortunatelyL the rules DYNAMIC-CLOSE3 and STATIC-CLOSEl do not 
correspond to each other in this respect. Consider the following example: 

Example 4.3: 



x = ref (ref 1) ; 




y = close { a = ref 2; 


I a^li 


b = ref 3; 


'/. b ^ l 2 


c = if true then a else b; 


'/, l\ and li are region aliased. 


x := c; 


'/, l\ escapes. 



in b }; 
y := 4; °/, Dynamic Write Error! 

In the above exampleL a and b point to two independent reference locationsL say l\ and 
1 1- The conditional statement for c unifies the static region variables corresponding to these 



83 



locationsr therefore l\ and l^ become region aliased. This means that statically we cannot 
distinguish between these two locations. Assuming dynamically that the predicate resolves to 
true and c gets bound to /iTwe export l\ into the environment by storing it into an external 
location and attempt to close l^ by returning it as a closed result. Dynamicallyr l^ is not 
visible in the environment so the DYNAMIC-CLOSE3 rule would close it. On the other handr 
statically there is no difference between l\ and ^Tand since l\ is being exportedrthe static 
rule STATlc-CLOSEf would not close the corresponding region variable creating a discrepancy 
between the static and the dynamic status of the location l^. This would ultimately lead to a 
write-error on the l^ location as shown. 

Note that this write-error is generated not because the dynamic semantics for close was clos- 
ing a location inappropriately as was the case for rules DYNAMIC-CLOSE1 and DYNAMIC-CLOSE2. 
This error came about because the static semantics was not sufficiently powerful to model the 
dynamic semantics accurately. One way to solve this problem is to classify such write-errors 
as static "close-errors" by making the static semantics little more conservative. This can be 
accomplished by causing the static rule to fail when a region variable cannot be closed rather 
than ignoring it. The following rule embodies this idea: 

Eh a:r \r 1 ...r n }=C(T) r % 4 T(E) 1 < i< n 
STATIC-CLOSE2: l J V ' ^ V ' ~ ~ 



E \- (close a) : {r 8 - h- > e}r 1 < i < n 

Using this rulerExample 4.3 would be classified as a static close-error and would be rejectedr 
since an attempt was made to close a region (corresponding to locations l\ and 1 2) which could 
not be statically verified for correctness. 

Unfortunatelyrthe above rule still suffers from a rather technical problem that stems from 
our desire to perform type inference. It turns out that the above rule is not stable under type 
substitution (Proposition 3.9). In particularrthe set of region variables C{<~p{t)) may turn out 
to be larger than the set <p(C(r)) = {tp(ri) . . .Lp(r n )} for a general substitution (p. This implies 
that new closable region variables may get introduced into a type by substitution that may not 
have been properly verified for correct close semantics previously. 

Stability of substitution is used in showing semantic generalization (Proposition 3.14) as 
well as the soundness of type inference (Proposition 3.19). The former could be attributed 
to the specific style of relational semantics we have decided to follow in this thesis but the 
latter is fairly standard machinery in the literature andlTf possibleFwe would like to retain 
it. Intuitivelyr failure of stability of substitution means that it may not be possible to show 
the soundness of a type inference algorithm based on this rule using standard unification and 
substitution machinery. 

4.1.3 Combining Type Generalization and Closing 

One way to devise a stable static rule for the close construct is to combine polymorphic 
generalization and object closing into a new language construct letclose x = a\ in a^ that 
behaves exactly like let x = a\ in a^ except that it erases all closable regions in the type of 
the expression a\ and then immediately generalizes that type before binding the resulting type 
scheme to x. Intuitivelyrtype generalization protects a typing derivation from later substitutions 
by quantifying its free type variables. Subsequent substitutions are then applied to polymorphic 
instantiations of the resulting type scheme which does not affect the original typing derivation. 
A possible dynamic and static semantics of the letclose construct is shown below: 



84 



dynamic-letclose: 



static-letclose: 



e h a\/s =>■ v\/s\ {h ■ ■ ■ In} = Closable(v, s\) \ Reachable (e, s\) 
si(li) = Vi, rw s^ = si + {li \— > Vi, ro} 1 < i < n 

e + {x H> ui} h a 2 /si => ^2/^2 

e h (letclose a; = ai in a2)/s =? t>2/ s 2 

i? h ai : ri £ + {ih> GenClose(E, Ti)} h a 2 : r 2 

i? h letclose a; = a\ in a 2 : r 2 



Wheref 



{^•••^rx} = 


= C(r)\^(£?) 


r' = 


= {r 8 - I— > e}r 1 < i < n 


{ai... a m } = 


-- T(t')\V(t')\T(E) 


GenClose(E,T) = 


= V«i ...a m .r' 



These rules formalize what we have informally stated in the above paragraph. In this 
formulationr closing an object does not faiirinsteadrthe definition of GenClose given above 
simply ignores such non-closable regions and does not generalize them. This property stems 
from the desire to keep type generalization as a non-failing property: if the type of an object 
cannot be generalized at a given scopeRt is best left as a monomorphic type rather than flagging 
a "polymorphism-error" . 

Unfortunatelyrthe above formulation suffers from the same region aliasing problem as dis- 
cussed earlier in the context of the STATIC-CLOSE1 rule. Dynamically closable locations may be 
aliased to statically non-closable regionsFand this discrepancy is silently ignored in the above 
rules. We can fix this problem as in the case of rule STATlc-CLOSE2rby flagging a static close- 
error if we fail to close a region that we were expected to close. Unfortunatelyrthis conflicts 
with the requirement of non-failing type generalization. 

4.1.4 Discussion 

We have seen above that the problem of devising a sound static and dynamic semantics for a 
close construct for multi-level data-structures and functions is sufficiently tricky and has many 
potentially conflicting requirements. This warrants a re-inspection of our approach towards this 
problem. 

Extending the static and dynamic semantics of a language to handle additional complexity 
and/or language constructs must fulfill the following requirements: 

1. The dynamic semantics of a new language construct should be able to accurately model 
what that construct is intended to do in a simple and intuitive manner. The semantics 
should also take into consideration what is efficiently implementable on a machine. This 
conflict among what we intendrwhat we can modeirand what we can efficiently implement 
is very important to resolve in the design of a new language construct. 

2. Similarlyrthe static semantics machinery should be intuitiver efficiently implementabler 
internally stableFand externally consistent with respect to the dynamic semantics. The 
consistency requirement places a lot of constraints on the static machinery and it may 
not always yield the most general solutions. 

3. Finallyrwe should also pay attention to other requirements on the design of a new lan- 
guage construct such as simple and understandable syntaxrtype inference etc. that may 



85 



not directly affect its semantics or the efficiency of implementation but may affect its 
widespread acceptability as a useful construct. 

In the light of the above remarksFwe have decided to abandon the search for a universal 
CLOSE rule. BelowFwe present our proposal for a family of CLOSE rules for closing a fixed set 
of regions and locations depending on the structure of the object at hand. 

4.1.5 Closing a Fixed Set of Regions/Locations 

The important point to realize is that closing a known set of locations that are characterized by 
a statically fixed set of region variables is perfectly sound. In the above examplesFwe ran into 
trouble when we tried to close an arbitrary set of locations for which we could not determine a 
statically fixed set of region variables. 

In some senserdosing only a fixed set of region variables at a time gives us more fine grain 
control over what locations are being closed dynamically. In order for this strategy to work 
with multi-level data-structuresrthe following requirements must be met: 

f . We need to specify statically which region variables we want to close. 

2. We should be able to verify the soundness of closing these region variables against the 
type environment and other region variables that have not been closed. 

3. The locations corresponding to the regions being closed must be similarly identifiable and 
closable in the dynamic semantics. 

4. Finallyrall the locations and the regions being closed and those that are left aside must 
jointly satisfy the region abstraction Proposition 3.15r«.e.rwe cannot close a region vari- 
able statically without closing all its corresponding locations in the dynamic semantics 
and vice versa (region aliasing). 

The above requirements directly lead us to an approach where we do not have universal 
static and dynamic semantics rules for the close construct. Insteadrwe have an algorithm 
to synthesize an exact static and dynamic semantics rule for each multi-level data-structure 
pattern that we wish to close. This would give rise to a family of rules depending on the 
structure of object at hand and the particular set of locations we wish to close within that 
object. For exampleFclosing a ra-tuple consisting of n reference locations can be accomplished 
using the following rules (c.f. single reference CLOSE rules in Figures 3.1 and 3.2): 

e \- a/s =? (n-tup l\ . . . l n ) / s\ s\ (/,-) = Vi, rw 1 < i < n 

L = Reachable ((n-tup l\ . . .l n ), s\) U Reachable(e, si)U 

JJli e Dom( s ) Reachable (I' , s t ) 

e h (close a)/s =? (n-tup l\ . . .l n ) /{s\ \l +{/; i— > Vi, ro}) 1 < i < n 

Eh a: (ri re/(ri)), . . ., (r n ref{r n )) 
STATIC-TUPCLOSE: 7"; ^ {T(E) U T{t\) U • • • U T(T n )) 1 < i < n 



dynamic-tupclose: 



E h close a : {t x re/(e)), . . ., (r n re/(e)) 



Similar rules may be constructed for any subset of tuple fields containing reference values. 
Extending the above rules for closing tuples of references and vectorsFwe can easily handle the 
following example that combines their use in a non-standard way: 



86 



Example 4.4: 

def polar2rect n = 

close { xs = i_vector (l,n); 
ys = i_vector (l,n); 
rsum = ref 0.0; 
_ = { for i <- 1 to n do 

rad,theta = . . . some large computation . . . ; 
xs [i] = rad * sin theta; 
ys [i] = rad * cos theta; 
rsum := !rsum + rad; } 
in !rsum/n, xs, ys }; 

Herertwo vectors are closed and returned along with the accumulated average of a third 
quantityrall arising out of the same large shared computation. It is important not to repeat 
the computation and keep the storage space to a minimum. The use of an imperative style 
protected by the close construct makes the computation efficient and understandable without 
sacrificing overall functional behavior. 

Steps in Synthesizing CLOSE Rules 

In generairgiven an arbitrary program expression a that returns a structured resultrsynthesizing 
a specialized static CLOSE rule involves the following steps: 

1. A group of region variables to be closed are identified from the type of the expression a 
using some appropriate language syntax. 

2. These region variables are then verified for soundness. This requires that none of these 
region variables should occur in the type environment and in the type of the closed result 
being returned. Furthermorerhone of these region variables should occur inside the closure 
type of an embedded function type as pointed out earlier. 

3. If all the region variables pass the verificationrthey are erased from the type of the resultr 
and the closed type is returned. Otherwise a static close-error is flagged. 

Similarlyr synthesizing a specialized dynamic CLOSE rule involves the following steps: 

1. A group of locations to be closed is identified from the given value that correspond to the 
static region variables being closed. 

2. These location are verified for possessing the read/write tag within the current store. 
Otherwisera dynamic close-error is raised. 

3. If all the locations pass the verificationrtheir tags are flipped to read-only and the closed 
value is returned along with the current store with a slightly restricted domain as shown 
in Chapter 3 dropping any region-aliased handles to the locations being closed. 

4.1.6 Type Annotations as "Close" Specifications 

A simple way of specifying which regions to close in an arbitrary expression is to match it 
against a separate pattern and mark certain regions to be closed in that pattern. Note that 
this pattern matches the type of the expression and not its value. This is because several 
locations may be aliased to the same region variable by definition and we must close all of them 



87 



simultaneously. Thenrit makes sense to specify them once using their type rather than specify 
each of the locations individually. 

A type pattern may be specified in a type annotation for the close expression as shown 
below: 

expressions: a ::= ... 

| (close a) :: T ann close expression 

Hererthe expression a would usually be a program block which returns a structured result. 
The annotation type T ann would explicitly show the various type constructors present within 
the expression's type along with their region parameters. The precise regions parameters to be 
closed are specified using the null region (e). The syntax used for specifying the annotation 
type is the full type grammar shown in Section 3.2.1 with the addition of a "don't care" type 
pattern (_) that may be used in place of any typeFregionFor closure type expression within the 
annotation. The scope of the free typeFregionFand closure extension variables of the annotation 
type is taken to be that annotation itself; annotation types in different parts of the program do 
not share variables. 

Examples of this specification have already appeared in Chapter 2 within Examples 2.15r 
2.20rand 2.21. The static typing rule for such type-annotated expressions may now be given 
as follows: 



E\- a : T inf {ri . . . r n } = (r m/ ~ T ann ) r t £ {T{E) U T ann ) 1 < i < n 

annote-close: . — . — - — 

t, \- (close a :: T ann ) : r ann 

The type r,-„^ stands for the inferred type of the expression a. The operation (r,-„^ ~ T ann ) 
matches the annotation type against the inferred type to determine the exact set of region 
variables being closed. Unlike the STATIC-CLOSE2 ruler this set remains stable under type 
substitution because the annotation type never changes. BelowFwe outline the mechanism of 
type and region matching and the subsequent verification of the close operation: 

1. The types r,-„^ and T ann must match exactly 1 except that some region variables in r,-„^ may 
be closed in T ann . For each parameterized type constructor T(pi...p n ) the number of 
regions in the inferred and annotated type must also match. For syntactic conveniencer 
we may allow a parameterized type constructor to appear without any region parameters 
in the given annotationrin which case all its region parameters are assumed to be the 
null region. 

2. Each inferred region parameter is positionally matched with the corresponding annotated 
region parameter in order to determine the precise set of region variables being closed: 

• A null region in the inferred type must match a null region in the annotation type. 
These represent previously closed regions that cannot be opened again. 

• A region variable r in the inferred type matches a null region in the annotation type 
and is considered as being closed unless it occurs within the closure type of a function 
(Section 2.4.4). In the latter caseFa static close-error is flagged. 

• A region variable r in the inferred type also matches a region variable r' in the anno- 
tation type as long as all occurrences of r in the inferred type match the same region 



Each occurrence of the "don't care" type pattern (_) within the annotation type is always assumed to match 
the corresponding type, region, or closure type expression present in the inferred type. 



88 



variable r' in the annotation type. For conveniencer we may allow this matching 
to behave like a region variable constraint on the inferred region parameters rather 
than a mere renaming of variables. A unification substitution {r h- > r'} may need to 
be generated in this case. 

3. Finallyrall region variables determined as being closed are collected in a set taking region 
variable constraints and variable renaming into account. This set of region variablesFsay 
{r\ . . .r n }Tcan then be verified for soundness as shown in the above rule ANNOTE-CLOSE. 
Checking that no region variable r\ being closed appears anywhere within the current 
type environment E or within the annotation type T ann ensures that the corresponding 
closed locations are not reachable from the dynamic environment or the returned value. 
This is similar in spirit to the simple CLOSE rule shown in Figure 3.2. 

The above scheme achieves both our original goals of specifying the regions to be closed 
and pinpointing the type environment to verify them against with a singler familiar language 
construct. Moreoverrit specifies multiple regions to be closed at various levels of a struc- 
tured result simultaneouslyrand it does this without adding additional semantic or syntactic 
complexity than was already present in the kernel language of Chapter 3. 

This scheme also identifies the dynamic locations to be closed quite easily. The structure 
of tuple types directly reflects the structure of the tuples themselves. Thereforerthe static 
distribution of regions variables to be closed within a structured type annotation directly leads 
us to the locations that need to be closed in the corresponding structured result. Locations 
within embedded function closures must never be closedrwhich is why the corresponding region 
variables are caught and flagged as a static close-error. 

In the next two sectionsFwe describe the semantics and close specification for arrays and 
general algebraic datatypes based on the above strategy. 

4.2 Closing Arrays 

4.2.1 Dynamic Semantics 

We can easily generalize a single mutable reference location introduced in Chapter 3 to an array 
of indexed locations all of which belong to the same region. In factrthe ref construct may be 
viewed as a special case of a 1-dimensional array with length 1. Indexed locations effectively 
model consecutive memory addresses on which index computations may be performedralthough 
the starting location of the array would still remain abstract. This treatment of locations is 
a little more concrete than that in Chapter 3 where every location was considered to be an 
independent abstract label. 

We represent a 1-dimensional array as a pair (vect l,n) giving the starting location / and 
its length as a positive integer literal n. These are added to the set of dynamic values: 

values: v ::= ... 

| (vect I, n) vector of length n 

The values associated with the slots < i < n of a vector (vect l,n) are stored at the 
locations /,...,/ + n — 1 within the store s. All these locations are assumed to be directly 
accessible from the vector value: 

£((vect l,n)) = {l,...,l + n- 1} 



89 



VECT-ALLOC: 



vect-deref: 



vect-assign: 



vect-close: 



e h a/s =>■ nj s\ (I + i) G - Dom(si) < i < n 

e h allocvect(a)/s =>■ (vect /, n)/ {s\ + {/ + i h- > _L, rw}) < i < ra 

e h ai/s =>■ (vect l,n)/si e h <22/ s i =>■ i/ s 2 

(/ + i) G Dom(s2) value (s2(l + i)) = v 

e \- ai[a2~\/s =>■ f/ s 2 

e h ai/s =>■ (vect /, n) / s\ e h <22/ s i =>■ i/ s 2 e I - a s/ s 2 =>■ w / s 3 

(/ + i) G Dom(ss) tag{s^{l + i)) = rw 

e h (ai[a 2 ] = a 3 )/s =^ ()/(s 3 + {l + i^i), rw}) 

e h a/s =>■ (vect l,n)/si s\(l + i) = Vi, rw 0<i<ra 
Z, = Reachable((vect l,n),si) U Reachable (e, si)U 

Uz' £ _Dom( s ) Reachable(l' , s t ) 

e h (close a)/s =>■ (vect l,n)/(si \l +{/ + i h- > u,-, ro}) < i < n 



Figure 4.1: Dynamic Semantics of Arrays. 



We also extend reachability (Definition 3.2) for vector values: 



Reachable ( (vect l,ri),s) 
Reachable((vect l,n),s) 



4> 



£((vect/,n))uUo< 



/ ^ Dom(s) 
3<i<n Reachable(value(s(l + i)), s) Otherwise 

The algorithm Gather-Locations is correspondingly extended to collect such locations. 

Figure 4.1 shows the dynamic semantics rules for 1-dimensional arrays. These are straight- 
forward generalization of the corresponding rules for the ref construct. The primitive opera- 
tor rules for vector allocation (allocvect)rvector dereference (a[i])Tand vector assignment 
(a[i]=v) operate as expected. During vector allocationFra fresh locations are added to the 
domain of the store each of which is initialized to a special "undefined" constant (J-). 2 The 
domain validity test in dereference and assignment rules simulates bounds checking because only 
the indices within the bounds / • • • / + n — 1 would be present within the domain of the store 
for a given vector value (vect l,n). Finallyrthe VECT-CLOSE rule closes all the locations of the 
vector simultaneously. 

Multi-dimensional arrays may be modeled in a similar fashion or may be linearized into 1- 
dimensional arrays. In the latter caserthe linearized vector value may need to keep additional 
information to translate a multi-dimensional index into a linearized index. 



4.2.2 Static Semantics 

Since arrays are considered to be homogeneous data-structuresFall values contained in it must 
have the same type and all its locations must belong to the same region. This means that a 
single region variable suffices to represent the imperative properties of the array. ThereforeFa 
mutable vector containing values of type r is typed as (r vector (r)) just like a mutable reference 
type (r ref(r)). The free and dangerous variables of the vector type are also computed just like 
those for a reference type. 



2 This formulation is useful for synchronized arrays (I-structures and M-structures); conventional unsynchro- 
nized arrays as shown here may in fact be initialized with any constant of the appropriate type. 



90 



The types of the primitive array operators are shown below: 

typeof (allocvect) = \/t,u,r. int —{u)-^t vector (r) 

typeof (_[_] mu table) = Vi, u, r. (t vector (r), int) -{u)-^t 

typeof (_[ _] non-mutable) = Vt , u. (t vector (e) , int) -(u)-^ t 

typeof '(_[_] =_) = Vi, w, r. (i vector (r), int, t) — (u)— > unit 

The static semantics rule for closing arrays operates exactly like that for the ref construct 
and is shown below: 

Eh a:r vector (r) r 4 (T(E) U T(t)) 
vect-close: ' v v ' " 



E \- close a : r vector (e) 

All the proofs for the ref construct given in Sections 3.1 and 3.2 extend naturally to arrays 
since all the locations contained within an array are simply an extension of its starting location 
/. We never create "internal" pointers into the middle of an array and operate on individual 
locations of the array. For instancer all indexed references on vectors operate on the value 
(vect /, ra) and an index offset iTl + i by itself is not taken to be a valid value. For the purpose 
of reachabilityrthis ensures that all locations of an array are always taken together in a group 
which is similar in spirit to the treatment of the ref construct. 

4.2.3 Semantic Model and Soundness 

The store typing S carries the type (r vector (p)) at every location of the vector just like it 
carries the full reference type at a ref allocated location. ThusFwe can extend the semantic 
model (Definition 3.12) in the obvious manner: 

Definition 4.1 (Extended Semantic Model) Let s be a store, S be a store typing, e be an 
environment, E be a type environment, v be a value, t be a type, and a be a type scheme. 
Define the following relations: 
Case 1: S \= v : r — ... 

SubCase 1.6: S \= (vect /, ra) : r vector (r), if (I + i) £ Dom(S) and S(l + i) = r vector(r) 

for all < i < n. 
SubCase 1.7: S \= (vect /, ra) : r vector(e), if (I + i) £ Dom(S) and S(l + i) = t' for all 
< i < n. Furthermore, there exists a substitution ip with Dom{ip) C T{t') \ T>{t') such 
that <~p{t') = t ref(e). 

Case 4 : \= s : S — ... 

SubCase J h 3: If S(l) = r vector(r) then s(l) = v , rw and S \= v : r. 
SubCase 4-4 : If S(l) = r vector(e) then s(l) = v , ro and S \= v : r. 

Proofs for semantic soundness from Section 3.3 also extend naturally to vectors using this 
extended semantic model. A simple reference value / is replaced by a vector value (vect l,n) 
and statements about the store typing of that location S(l) are replaced by those applying to 
the group of locations S(l + i) for all < i < n. Proofs that do not directly depend on structure 
of values or of evaluation rules such as the region abstraction Proposition 3.15 do not change 
at all. 

The above machinery allows us to finally answer the problem we posed at the beginning of 
Section 2.1 about implementing functional arrays in Id. The solution proposed in Section 2.3 
for implementing function make_vector (Example 2.13) can now be automatically verified for 
correctness by the type system and is reproduced below: 



91 



Example 4.5: 

i_vector :: Vi, u, r.(int, int) — (u)— > (t vector (r)) 

make_vector :: Vi, u.(int —(«)—>■ i) — > (int, int) —(int — (m)— >t)— > (t vector (e)) 
def make_vector f (l,u) = 
close { a = i_vector (l,u); 

_ = { for i <- 1 to u do 
a[i] = f i }; 
in a }; 

The i_vector primitive allocates an empty vector between bounds (/, u) and initializes it to 
contain the "undefined" value (_L) everywhere. The region variable in the type of the allocated 
vector shows that it is assignable. On the other handrthe null region (e) in the type of the 
returned vector from make_vector shows that it has been safely closed into a functional vector. 

4.2.4 Modeling I-Structure and M-Structure Arrays 

Readers may have noticed that the above description only presents unsynchronized mutable 
arrays that are closed into unsynchronized functional arrays. A few words are appropriate here 
regarding the modeling of synchronized (I-structure and M-structure) arrays present in Id. 

As discussed in Section 2.3.5ra mutable array may be implemented using any one of the 
three underlying memory access protocols: unsynchronizedri-structureFor M-structure (refer 
Figure 2.1). SimilarlyFa functional array may be implemented using one of the two protocols: 
unsynchronizedror Fstructure. Howeverrthe static typing machinery presented above allows us 
to only distinguish between a single mutable vector type vector (r) and its corresponding func- 
tional vector type vector (e). It does not matter which underlying protocol each type represents 
as long as we use the appropriate kind of barrier during the close operation (see Section 2.3.5)T 
and that objects belonging to the two types are represented in the same way. The latter con- 
dition is required so that the close construct can simply change the view of an object from 
mutable to functional without requiring any data layout conversion. 

In a conventional language such as C or FortranlVith only one kind of memory access proto- 
col (unsynchronized)rthe simple two-way classification described above is sufficient. Howeverr 
in Id we use two memory access protocols: I-structure and M-structureFgiving rise to two types 
of assignable arrays and one type of functional arrays. Sincerin Id functional objects are also 
implemented using I-structuresrit is natural to use the I-structure protocol for objects with 
either the assignable type vector (r) or the functional type vector(e). This wayrthe underlying 
data layout is guaranteed to be the same in the two cases and no barrier is needed during the 
corresponding close operation. This leaves us with the question of how to type M-structure 
arrays and close them into functional arrays. BelowFwe discuss some possibilities. 

One possibility is to assign M-structure arrays a separate mutable type constructorFsay 
m_vector(r) Tand then somehow convert the type constructor m_vector into vector when closing. 
Semanticallyrthis is not very clean because it requires an additional type conversion during the 
close operation. Moreoverrthis scheme does not express the language constraint that the 
layout of M-structure and functional objects is expected be the same. That constraint is buried 
under the semantics of the type conversion operation from M-structure objects to functional 
objectsr which is left unspecified. Unsuspecting compiler writers may choose different data 
representations for M-structure and functional objects which would make the close operation 
on M-structure objects incorrect (or extremely inefficient). 

Another possibility is to expand our region algebra to accommodate two different kinds 



92 



of mutable objects: I-structure and M-structure. This is easily accomplished by using two 
kinds of region variables: r l denoting I-structure regionsFand r m denoting M-structure regions. 
No implicit conversions would be allowed between the two kinds of region variables via type 
substitution or instantiation. The close construct would be used to explicitly close either kind 
of region variable into a null region. It is easy to see that all the semantic machinery presented 
in Chapter 3 would extend trivially to this scheme. 

Under this schemer a single parameterized type constructor may be used to denote all 
three kinds of arrays: vector(r' m ) for M-structure arraysFuecfor(r 8 ) for I-structure arraysFand 
vector (e) for functional arrays. The uniform type constructor used in all cases denotes the 
language constraint that the underlying data layout should be the same in all three cases. This 
scheme clearly separates the semantic modeling of the layout of an object which is denoted by 
its type constructorrfrom the modeling of its mutability and synchronization properties which 
is denoted by its region parameters. 

It is easy to see that the region algebra may be enriched even further in order to accom- 
modate unsynchronized objects within the same framework. This ability provides a natural 
extension to our type system when adding unsynchronized objects to Idror adding I-structure 
and M-structure objects to conventional languages such as C or Fortran. 

4.3 Closing General Algebraic Datatypes 

4.3.1 Specification Issues 

General algebraic datatypes introduce yet another dimension in the syntactic specification of 
closable regions and locations. In this sectionFwe informally present some of the issues via 
examples that are formalized in later sections. 

Multiple Region Parameters 

Consider the functional list datatype declaration shown below: 

Example 4.6: 

type list t = nil | cons t (list t) ; 

There are two fields in the cons constructorFeither or both of them could be made mutable 
and closed independently. When a field of a datatype becomes mutablerit has to be tagged 
with a region variable which is reflected in the datatype constructor as a region parameter (e.g.T 
the type constructors ref(p)Tor vector (p)). There is some flexibility in deciding whether to add 
additional region parameters to a type constructor for each mutable field or tag several mutable 
fields with the same region variable. 

One possibility is to always require the user to specify the distribution of region parameters 
explicitly. On the other handrit may be possible for the compiler to automatically add the 
region parameters to a mutable datatype declaration according to some fixed strategy. The 
question of whether two mutable fields should be modeled using the same region variable or not 
depends on how the fields are manipulated and closed within the rest of the programFalthough 
a fixedrcompile-time heuristic is probably more desirable. For instancerthe compiler could 
simply assign a single region variable per datatype or it could determine the largest independent 
set of region variables that would characterize a given datatyper subject to recursive typing 
constraints. ThusFeither of the following declarations for mutable lists would be acceptabler 
although each provides a different degree of flexibility and approximation: 



93 



Example 4.7: 

type list(r) t = nil | cons (r)\t (r) ! (list(r) t) ; 

type list(ri,r 2 ) t = nil | cons (r^'.t (r 2 ) ! (list(ri, r 2 ) i) ; 

In the above declarationsFwe have prefixed a region variable to each of the mutable fields. 3 
The first declaration identifies the entire spine of the list with the same regionrwhile the second 
declaration classifies heads and tails separately. Whether the first or the second declaration 
should be used depends on whether we wish to close heads of a list without closing the tails 
or vice- versa. In generairit is useful to have as much flexibility as possiblel" especially if the 
heads and tails employed different memory synchronization protocols (see Section 2.3.5)Tso the 
second declaration appears to be a better choice. HoweverFnote that both fields share the same 
type variable (i)Tso we will not be able to generalize objects of this list type unless both regions 
are closed. Thereforerif we are only concerned about converting mutable lists to completely 
functional listsrthen collapsing the two regions into a single one may be more desirable since 
it simplifies the datatype representation. 

Inherited Region Parameters 

Embedded parameterized types within another algebraic datatype forces the type constructor 
being defined to inherit the region parameters of the embedded typeFotherwise there would be 
no way to generalize such region variables in a Hindley/Milner type system. For example: 

Example 4.8: 

type keyref (r) t = mkkeyref I (ref (r) t) ; 

Althoughrnone of the fields of the type keyref itself is mutablerit still must inherit the region 
parameter r of the embedded type re/Totherwise this parameter could never be generalized and 
would always point to the same region. This information can easily be taken into account within 
the compiler while computing the region parameters of a datatype declaration automatically. 

Closure Type Parameters 

An interesting problem occurs with general algebraic datatypes that may hide function closures 
inside them. The closure typing system described in Section 2.2.6 works well with higher-order 
functions since we have a way of expressingFpropagatingFand generalizing over closure types 
directly as they are defined while typing a A-abstraction or instantiated at a function reference. 
Butrif a function is carried indirectly by storing it within a data-structureFwe must still not lose 
its closure typing information because of such indirection. OtherwiseFwrite handles embedded 
inside such functions could escape undetected. To illustrate this subtle pointr consider the 
following example: 

Example 4.9: 

type capture to = capt (int — (uo)— >to — (tti)— >io) '> 

def escape_5 n = '/, escape_5 :: \fto.int — > (vector to, capture to) 

close { a = i_vector (l,n); 

def g i v = '/, g ::\/u2U3.int —(u2)-^to —(vector(r) to,us)-^to 

{ a[i] = v; in v }; 



A dot (.) in front of a field denotes that it is an I-structure field, while a bang (!) denotes that it is an 
M-structure field. 



94 



in a, capt g } :: (vector _),_; 

As shown aboverthe datatype capture has a single type parameter to and its sole con- 
structor capt stores a polymorphic function closure. It is necessary to parameterize this type 
with the closure extension variables uo,u\ of the hidden function type so that these variables 
can also participate in type generalization and close verification. With the declaration shown 
aboverthe type system will be unable to detect that a write handle to the array a being closed is 
escaping via a function closure since that function is hidden inside a data-structure. We should 
point out that this parameterization is necessary for the closure typing system itself to work 
properlyrthis is not specifically related to the close construct. Without such parameterizations 
one would be able to launder functions with complicated closure types by simply storing them 
into a data-structure and then fetching them back. The correct declaration for the datatype 
capture is shown below with additional closure type parameters: 

Example 4.10: 

type capture to uq u\ = capt (int — (uo)— >to — (wi)— t-^o) '> 

The closure type parameters on datatypes behaves exactly like closure extension variables 
within closure types of function. For examplerthe type of capt g now instantiates the extension 
variable u\ of the datatype capture with the closure type (vector (r) to,uz) of the function gr 
thereby exposing the hidden region r embedded within the closure type. This would allow the 
subsequent close verification process to flag the escaping region as a static close-error. 

4.3.2 Syntactic Specification of Algebraic Datatypes 

Nowrwe are ready to show the full machinery for the specification of general algebraic datatypes. 
A general algebraic datatype declaration is shown below: 

type T(ri...i) ii...j ui...fc = C\ {pu)t u ... (piaj^iai 

I ^m \Pml)T~ml • • • yPmamf^mam 

This declares a type constructor TTwith r\ . . .r\ as region variable parameters]^! • • -tj as 
type variable parametersFand u\ . . . u^ as closure extension variable parameters. This datatype 
has m constructor disjuncts C\ . . .C m each with its own arity a\ . . .a m any of which could be 
zero. Each field of a non-nullary constructor C p has an independent type T pq and a region 
expression p pq . The type T pq may use regionrtypeFand closure variables from the declared 
parameters of the datatype T. The region expression p pq either consists of exactly one region 
variable parameter denoting that this field is mutable or it is the null region e denoting that 
this field is functional. 4 

The above declaration may be supplied by the userFor the compiler may automatically 
augment an ordinary datatype declaration containing only type variable parameters with addi- 
tional region and closure extension parameters. In order to do sorthe user must at least specify 
which fields are expected to be mutable and which ones are functional. ThenFa maximally 
independent set of region variable parameters and a set of closure extension parameters may be 
computed for each datatype T declared within the program using the following steps: 

1. First we assign region expressions p T to each field of each datatype T declared within the 



Additional syntax may be used to distinguish between I-structure and M-structure fields. 



95 



program as follows: 

T J r pq r pq i s new an( i the pq-th field in the datatype T is mutable 
Ppq = | e Otherwise 

Each datatype T is initially assigned the region parameters R = \Jp pq and the closure 
extension parameters U = (J Closure-Variables(r pi ) . 

2. Now we construct a datatype reference graph consisting of all the datatypes declared 
within the programr where there is an edge from a datatype T\ to another datatype 
T2 if T2 occurs within some field type T pq of T\. We partition the nodes of this graph 
into strongly connected components (SCC) [AHU74] according to this (directed) edge 
criterion. This puts mutually recursive datatypes into the same component. We will use 
this information to assign the same region and closure extension parameters to mutually 
recursive datatypes. 

3. Nowr proceeding in a topologically bottom-up fashion on each SCC of the above refer- 
ence graphrwe compute the final set of region and closure extension parameters for each 
datatype as follows. If two datatypes T\ and Ti belong to the same SCCrthen all occur- 
rences of one inside the other use the same variables. If T\ refers to Ti and they belong 
to different SCCsrthen for each occurrence of Ti within the declaration of T\ we rename 
the parameters associated with T^ (R T2 and U T2 ) to fresh variables and recompute the 
parameters of T\ [R l and U l ) . 

4. Finallyreach datatype T within the same SCC is assigned the region parameters UtgSCC-^ 
and the closure extension parameters UxeSCC^ T • 

Intuitivelyrthe above algorithm assigns a new region variable to each statically distinguish- 
able mutable field keeping track of inherited and recursive regions. In this senserit computes a 
maximally independent set of region variables for each datatype. For examplerthis algorithm 
would automatically compute the region assignment {list{r\,r2) t) shown in Example 4.7 for 
the following type declaration which specifies both heads and tails as being mutable: 

Example 4.11: 

type list t = nil | cons \t ! (list t) ; 

4.3.3 Dynamic Semantics 

Dynamicallyreach constructor disjunct C p gives rise to a value (C p v\ ■ ■ -v a ) where C p denotes 
a tag that identifies the disjunct and v 1 • • • v a are its field values. The value corresponding 
to a mutable field is a unique location l pq whose contents are accessible through the store. 
This generalized representation subsumes the functional ra-tuples ((n-tup v\, . . . , v n )) and single 
mutable reference cells (/) used in Chapter 3 because it permits individual locations of a tuple 
itself to be mutable. In order to avoid confusionrwe now represent individual mutable reference 
cells such as those used in Chapter 3 using the following datatype declaration: 

Example 4.12: 

type ref(r) t = ref (r)\t; 



96 



A mutable reference cell would now be represented as (ref /) instead of a bare location / 
which by itself is no longer considered to be a proper value and may only appear as a mutable 
field value within a constructor value. 5 

The locations directly contained in a constructor value C({C P v\ ■ ■ -v a )) are naturally de- 
fined to be the set of field values that are locations. Similarlyr the reachable locations of 
a constructor value (with respect to a store s) are the set of locations directly or indirectly 
reachable from all the fields of the constructor. 

The primitive operations of allocationrdereferenceFand assignment extend naturally to con- 
structor disjuncts and their embedded mutable and non-mutable fields. The reader is referred 
to [Nik91] for details of the exact syntax used in Id. The dynamic semantics of these operations 
is given by a family of allocationrdereferenceFand assignment rules on the lines of those shown 
for reference cells in Chapter 3. 

The dynamic semantics of closing a constructor value follows the discussion in Section 4.1. 
The main problem is to identify the set of dynamic locations to match the specified region 
variables that are being closed in a general algebraic datatype. For non-recursive datatypesr 
the locations to be closed are exactly those carried directly within the constructor value at the 
field position corresponding to the region variable being closed. As an exampleFwe reproduce 
the point datatype from Example 2.17 below with explicit region parameters. Both fields of 
the point ptl are closed while only the second field of pt2 is closed: 

Example 4.13: 

type point(ri,r 2 ) = pt (r\)\ float (r 2 ) \ float; 

ptl = close (pt 1.2 3.5) :: point; °/, Abbreviation for point (e, e) 

pt2 = close (pt 2.2 4.7) :: point (_, e) ; 

For recursive datatypesrthe value contained within each field that recursively refers to the 
region variable being closed must also be traversed and closed. Consider the following example 
using mutable lists: 

Example 4.14: 

type list(ri,r 2 ) t = nil | cons (r^'.t (r 2 ) ! (list(ri, r 2 ) t) ; 

11 = close (l:2:3:4:nil) :: (list(e,_) int) ; 

The dynamic implication of closing the first region parameter r\ of the list 11 is that all 
head fields on the spine of the list get closedralthough the tail fields still remain mutable (since 
r 2 is not closed). This is because after closing the head field of the first cons-celirwe must 
recursively traverse its tail field in order to close the region parameter r\ in the remaining list. 
This process continues until we hit nil in the tail field since there are no more fields to recurse 
into. 

NowFwe show a real example involving recursive datatypes that shows the usefulness of the 
close construct in building functional objects from the corresponding mutable ones. We present 
an efficient implementation of the map_list function that does not even require reversing the 
final list (c.f. function imp_map in Example 2.6) because the list is generated from left to right 
using a technique known as "open-lists" [ANP89]: 



5 We abuse our notation slightly by calling locations embedded inside a constructor value as field values just 
like the other values present directly within the constructor, although bare locations are no longer considered to 
be proper values. They only serve to define the domain of the mutable store. 



97 



Example 4.15: 

def map_list f nil = nil 
I map_list f (x:xs) = 
close { 

hd = cons _ _; °/, The expression (cons _ _) allocates a (cons_L,_L) 

hd.cons_l = f x; 
tl = hd; 

finaltl = { while not (nil? xs) do 
newtl = cons _ _; 
next x : next xs = xs ; 
newtl. cons_l = f x; 
tl.cons_2 = newtl; 
next tl = newtl; 
finally tl }; 
finaltl .cons_2 = nil; °/, The expression nil allocates a (nil) 

in hd } :: (list _) ; °/, Abbreviation for (list(e,e) _) 

FinallyTobserve that the set of locations that need to be examined for closing a given region 
variable in a general algebraic datatype depends solely on its type declaration. For instancer 
we know at the time of declaring the list datatype (Example 4.14) that the region variable 
r\ occurs inside the type of its tail field. ThereforeFwe need to examine all the cons-cells on 
the spine of the list in order to close the region variable r\. But we do not have to examine 
the objects contained within the head fields in order to close the region r\. If r\ occurred 
inside the type of the objects contained within the head fieldsrthen the static semantics for the 
close operation described below would generate a static close-error and such a program would 
be rejected. ThusFan exact dynamic CLOSE rule can always be constructed for each region 
variable of a polymorphicFuser-defined datatype at the time that datatype is declared without 
regard to how it is instantiated at various places within a program. 

4.3.4 Static Semantics 

The free variables of a general algebraic datatype are defined as follows: 

F{T{P\...i) n...j 7Tl...fc) = \JiF(pi) (JUj-Ffo) \JU k T(7T k ) 

The dangerous variables of a general algebraic datatype may either be dangerous within one 
of its argument types Tj or closure types ^Tor they may occur within the type of a mutable 
field of one of its constructors. In the latter easel" all the type variable parameters occurring 
within that field are inherently dangerous much like the type of an object contained within a 
mutable reference cell. ThereforeFwe define: 



>(T(pi...i) n.., ttl.jO = U t T( Pt ){Ju k V(K k ){Ju 3 | J^'j 



_./„,, n n ,w n i i ^, ii i i T(ri) If i 7 occurs inside a mutable field 

Otherwise 



Finallyrthe dangerous region variables of a general algebraic datatype are defined as follows: 

n{T{ Pl ...i) Tl...,- 7Ti...fc) = \JiF(pi) Uujftfo) \JU k U(7r k ) 

The types of the primitive operators for allocationrdereferenceFand assignment of construc- 
tors and their fields are defined as expected. 



98 



The static CLOSE rule also follows the discussion of Section 4.1. We only need to show how 
to perform the verification for flagging a static close-error for algebraic datatypes. This is done 
as follows: 

1. Given an type-annotated expressionr (close a) :: T(pi...j-) ti...j 7Ti...jtr along with an 
inferred type T(p^ 8 ) t[ ■ ir^ ^Tfirst we match the regions p\...i specified in the annotation 
against the corresponding regions p' t i of the inferred type. Null regions in the inferred 
type must exactly match the corresponding regions in the annotation type. While some 
region variables in the inferred type may be constrained to be closed (mapped to e)T 
other region variables are simply renamed/unified to the region variable specified in the 
constraint. 

2. The candidate region variables so determined to be closedrsay {r\ . . .r n ]Tmust not occur 
inside a function closure type within the inferred type parameters t[ • or within the 
inferred closure parameters 7r^ k . This ensures that we do not close region variables that 
are captured inside function closure types. 

3. Finallyrthe region variables being closed must satisfy the following test with respect to 
the annotation type: 

Vr£{ri...r4 r £ [t(E) [JU 3 T(t 3 ) {Ju k T(K k )} 

If any of the above tests failsFwe flag a static close-error. Otherwiserthe close operation is 
considered to be successful. 

4.3.5 Soundness 

The static and dynamic CLOSE rules for general algebraic datatypes described above are direct 
extensions of the formal machinery shown for reference cells in Chapter 3. It is reasonably 
straightforward to see that we follow the same idea of specifying a fixed set of static regions 
to be closed for an identifiable set of dynamic locations. ThereforeFall the semantic machinery 
given in Chapter 3 extends naturally to this framework. 

4.4 Functional Encapsulation in Conventional Languages 

We mentioned in Section 1.3 that the functional encapsulation mechanism presented in this 
thesis would also be quite useful in a monomorphicrfirst-order language such as CrPascairor 
Fortran. HoweverFadding this mechanism to a conventional language may require a few changes 
in the language and its type systemFa possible change in the programming styleFas well as 
possible simplifications within the proposed type system itself. In this sectionFwe outline how 
all this might be achieved using C as an example. 

It is clear that in order to make any kind of guarantees based on the type systemFwe 
must have a strongly-typed language. C is not strongly-typed because it allows unrestricted 
type conversion among object at the discretion of the user via type-casting [KR88]. Using 
this facility the user may convert pointers to closable objects into non-pointer datatypes and 
vice-versarthereby completely throwing off our type analysis. ThereforeFno type-casting may 
be allowed in order to ensure soundrverifiable functional encapsulation. 

The type system of C would obviously need to be extended with regionsr although with 
suitably chosen syntactic defaults regions may not appear explicitly in many cases. For in- 
stancerthe compiler may automatically assign region parameters to all struct and union type 



99 



declarations as discussed in Section 4.3.2. The compiler would also need to define a unique 
memory allocation function for each declared datatype. This is necessary becauseFas discussed 
aboverwe have to eliminate the use of type-casts which is most often used to fix the type of a 
freshly allocated object using the only available memory allocation function malloc. 

The most important simplification in our type mechanism would be that we would no longer 
need closure types. AlthoughrC allows passing function pointers as arguments and resultsr 
functions are only declared at the top-level and they may only have free identifiers that are also 
declared at the top-level. Thereforerthe types of such free identifiers would always be visible 
within the global type environment and can never be closed accidentally. In other wordsFwe 
do not need to keep track of the types of the free identifiers of a function because such types 
would always be present in its enclosing type environment anyway. 6 This greatly simplifies our 
typing machinery and makes its even more intuitive and easy to use. 

Finallyrwe must point out that functional encapsulation is useful only if we localize the 
allocation and construction of objects to nested program blocks. This facility encourages a 
programming style where we dynamically allocate and update an object in a deeply nested 
blockrand then close and return that object into the enclosing block where it may be used 
functionally. This style is certainly possible in C and Pascal but may preclude some earlier 
versions of Fortran due to the lack of block-structure and dynamic memory allocation. 

4.5 Conclusions 

4.5.1 Summary of Part I 

In the preceding chapters we have presented a powerful type system that fulfills our goal for 
providing a sound and verifiable type abstraction mechanism between the high-level functional 
layer and the low-level imperative layer of a polymorphic programming language. We started 
with the problem of implementing functional array constructs present in our high-level language 
in terms of low-level imperative program fragments written in a small kernel language without 
sacrificing storage efficiency or parallelism. In the processFwe introduced a new construct 
within the kernel language called "close" that changes the view of a mutable data-structure 
from imperative to a functional one. The type system statically verifies the soundness of 
such a change and guarantees that successfully closed objects are never updated again during 
execution. 

We also showed how to extend the use of the close construct to complex data-structures 
within the language including arraysrtuplesrfunctionsFand general algebraic datatypes. We 
discussed issues of language design and specification of closing such data-structures and its 
effect on other language features such as type polymorphism and dynamic memory synchro- 
nization protocols. Our proposal for syntactically specifying closable objects blends nicely with 
already existing mechanisms of specifying type declarations and type annotations for program 
expressions. 

The type abstraction mechanism described in this thesis helps both compiler and language 
designers as well as the end-users. On the one handrit helps to reduce the size of the compiler 
by permitting efficient implementations of high-leveirfunctional constructs (e.g.rmake_vector 
in Example 4.5 and map_list in Example 4.15) to be pushed into system libraries rather than 
being implemented within the compiler as primitives. On the other handrit provides a tool 



This is also true in Pascal and Fortran even though Pascal allows internal function declarations [JW75]. This 
is because in all these languages functions are never passed outside the scope of their definitions. 



100 



for the end-user to design arbitrary new functional data-structures more efficiently using im- 
perative kernel constructs and then safely close them (cgThistogram in Example 2.16 and 
polar2rect in Example 4.4). In this senseFour type system provides a safe and controlled ab- 
straction mechanism for the end-user to exploit the power and efficiency of low-leveirimperative 
constructs without destroying the clean semantics of high-level constructs. 

4.5.2 Implementation Status 

The type system described in this thesis is currently unimplemented. ThereforeFour claims of 
displacing wired-in implementation of functional data constructors within the Id compiler in 
favor of system librariesFand user-level flexibility in implementing new functional abstractions 
are yet to be tested. Currentlyrthe Id compiler uses several internal "hacks" to provide these 
functional abstractions which would clearly be unsound if exposed to the user directly. 7 Our 
typing machinery would have the effect of cleaning and legitimizing these hacks into proper 
kernel language features. Our type system would also combine three different type declara- 
tions used for M-structureri-structureFand functional data objects into a single declaration as 
discussed in Section 4.2.4. 

Currentlyrthe Id language is undergoing major revisions and in its next incarnation as pH 
[NAH93] we hope to include some of the ideas embodied in this thesis. 

4.5.3 Future Work 

As mentioned aboverthe obvious first task for us is to implement this type system fully and 
study its usefulness not only in terms of the semantic cleanliness but also its implementation 
efficiency and ease of use. We would like to implement this system both for Id (and pH) as 
well as a restricted subset of the C language as outlined in Section 4.4. BelowFwe discuss some 
alternate directions for future research. 

Theoretical Improvements 

There are several aspects of the current research that need more detailed scrutiny. Throughout 
in this thesisFwe have used a strictr sequential dynamic semantics for our kernel language. 
We were able to do this because the problem of closing imperative data-structures is largely 
orthogonal to the issues of parallelism and synchronization which would have only made the 
formalization of the soundness proofs much harder. But it would be useful to show the sound- 
ness proofs directly in a parallel setting. This would also allow us to directly model the different 
closing strategies required with different memory synchronization protocols as discussed in Sec- 
tion 2.3.5. We feel that a graph rewriting framework such as [AA93] would be more appropriate 
for this purpose than the relational semantics approach taken here. 

Applications to Other Compiler Analyses 

This type system may also be used to infer useful static information that is conventionally 
determined using dataflow analysis or abstract interpretation. For exampleFwe know that the 
static verification strategy for the close construct provides a limited form of object escape 
analysis. It guarantees that there are no additional references to the object being closed other 



The current version of the Id compiler uses typeconverter declarations that simply change the type of an 
object without any semantic verification. It also uses internal pragmas to "fix" the functional polymorphism of 
array and list comprehension desugaring. 



101 



than the reference being returned from the close expression. This implies that the enclosing 
program fragment that receives the closed object has exclusive access to that object. If we do 
not make the object read-only upon closingrthen this type mechanism effectively provides a 
static way for verifying exclusive dynamic access to a mutable object without using any syn- 
chronization primitives (such as semaphores) or single-threading the object through the entire 
program. The enclosing program fragment could make exclusiver unsynchronized read/write 
accesses to the object for some time then pass out multiple references to other sub-programs. 
All such references may again be brought together and again checked for escape in an enclosing 
scope. 

Another important observation is the dynamic life-time of an object that is shown to be 
closable at the boundary of a close expression and is actually not returned from that expressionr 
is guaranteed to be bound to the scope of that close expression. This is because no references 
to that object may escape this scope. This information may be used to allocate such objects 
on stack instead of the heap as shown in [TT93]Tor insert additional code at compile-time to 
reclaim that storage automatically on the lines of [HJ92]. 



102 



Part II 



Types in Run-time System Design: 
Type Reconstruction 



103 



Chapter 5 

A Typed Run-time System 



5.1 Introduction 

Traditionallyrprogramming environments of dynamically- typed languages such as Lisp or Small- 
talk maintain type information in the form of run-time type descriptors on every object. This 
information may be usedrfor instancerto detect run-time type-errorsrto dispatch to different 
handlers for a given operation based on the type of the argumentsFand to distinguish pointer 
data from non-pointer data for the purpose of garbage collection. Although very flexible in 
designrsuch language implementations pay the price of managing type-tags either in the form 
of complex specialized hardware or in the form of extra space and time requirements in software. 

Languages geared towards high performance computation such as C or Fortran take the 
other extreme. They aim for a very simple and efficient run-time system with no type informa- 
tion to be maintained at run-time. The user is made directly responsible for complex tasks that 
may require run-time type information such as ensuring type consistency and automatic storage 
management. If necessaryrthe compilers for these languages can be explicitly instructed to gen- 
erate static type information to be used for specific run-time applications such as source-level 
debugging. 

Several important questions arise at this point. What is the advantage of having type infor- 
mation available at run-time? What specific applications may use run-time type information? 
How much type information is desiredr complete source-level types or a partial specification? 
What language design features may help or complicate the task of making run-time type in- 
formation available? How much of this type information can be pre-computed by the compiler 
and how? Do we need to carry the type information throughout execution or can it be recon- 
structed on demand? What is the run-time cost of such type maintenance or reconstruction? 
And finallyrhow does a typed language and its run-time system compare in terms of overall 
performanceFprogram reliabilityFand user flexibility to other systems? 

In Part II of this thesisFwe attempt to answer some of the above questions in the context of 
the Id programming language and its run-time environment. We study how source-level type 
information can be propagated through the compiler and made available during the execution 
of a program. We also discuss specific applications that use this information at run-time. 

5.2 Design Issues for a Typed Run-time System 

Several language design features affect the availability and the accuracy of type information 
during the execution of a program. FikewiseFrun-time system design decisions affect the overall 



105 



Strongly-typed 



Untagged 
Id 




Statically-typed 

Polymorphic 
9 




Weakly-typed 

c 

Dynamically-typed 

Lisp 

Monomorphic 

Pascal 



Figure 5.1: Design Issues for a Typed Run-time System. 



cost of computing and propagating this type information. Figure 5.1 shows several such design 
issues and classifies some existing programming languages on their basis. We discuss these 
issues below. 

5.2.1 Strong vs. Weak Typing 

Strongly-typed languages such as PascalFLispFor Standard ML provide a consistent model of 
assigning a type to every data object and every sub-computation in a program. Computations 
are allowed to proceed only if provided with objects of the right type. Enforcing type consistency 
allows run-time type information to serve as a reliable description of the computation being 
performed at any time. Thereforerit makes sense to use this informationr if availabler for 
applications that operate on a wide variety of run-time data and need some mechanism to 
identify and distinguish among them. Applications such as displaying objects in a source 
debuggerrmarking objects in a garbage collectorFand object I/O fall into this category. 

Weakly-typed languages such as C or Fortran permit the user to arbitrarily coerce the type 
of an object to another type. This makes the currently assigned type of an object to be a poor 
description of its actual contents. It is still possible to view an object according to its currently 
assigned typerbut there is no guarantee that it provides the complete and accurate description 
of the object. Thereforer providing reliable type information at run-time is possible only in a 
strongly-typed system. 

5.2.2 Static vs. Dynamic Typing 

Compilers for statically-typed languages such as Pascair or Standard ML enforce the type 
consistency expected from a strongly-typed program at compile-time. This frees up the system 
from the responsibility of checking for type consistency at run-time. Some modern languages 
like Haskell also provide systematic mechanisms to resolve overloading of operators and selection 
of methods at compile-time based on the static types of their arguments [WB89]. ThereforeF 
static typing offers many of the advantages of dynamic availability of type information without 
actually carrying that information at run-time. MoreoverFall the static type information may 
be saved and used in optimizations during the compilation phase itself or in other run-time 



106 



applications during program execution. Althoughradditional work may be needed to reproduce 
the desired information at run-time when demanded. 

5.2.3 Tagged vs. Untagged Object Model 

A simple way to provide type information at run-time is to tag every object: a few bits (usually 
one or two) in every word may be used as a tag to distinguish scalar objects from pointers to 
heap objects. More information about the type and size of objects may be kept in an object 
header. All dynamically-typed languages such as Lisp and Smalltalk use extensive tagging of 
objects in order to perform type consistency checks at run-time. Some implementations of 
statically-typed languages such as the Standard ML of New Jersey [App90] also make use of 
object taggingLusually for the benefit of the garbage collector. 

Tagging every object is costly. Keeping tag bits in every word reduces the range of repre- 
sentable scalars and pointers in conventional architecturesLand the user application also pays 
the additional cost of tag maintenance. SometimesLscalar values (usually floating point num- 
bers) may be boxed in a heap data-structure in order to preserve their full range. This incurs 
the additional cost of allocating the box and accessing it indirectly. 

Keeping objects untagged simplifies the memory model and eliminates the space and time 
overheadsLbut no type information is directly available at run-time. In weakly-typed languages 
such as C or FortranLthe user is directly held responsible for generating and propagating 
consistent type information at run-time. In statically-typed languages such as Pascal or IdLthe 
compiler and the run-time system may share the responsibility for carrying the type information. 
The compiler may generate detailed symbol tables for each function in the program. The run- 
time system may load and process the information before program execution or upon request 
from another application. 

5.2.4 Type Maintenance vs. Type Reconstruction 

RecentlyLseveral type reconstruction schemes have been proposed for statically-typed poly- 
morphic languages that do not incur the run-time tag management overhead [App89LGol91L 
GG92]. In these schemesL static type information may be combined with clues from the dy- 
namic state of the machine (the call stack) to automatically reconstruct the run-time type of 
most run-time objects. ThereforeLwith a small cost of type reconstructionLthe type-tags on 
such objects may be safely dropped without compromising the ability to determine their exact 
run-time types. 

If the semantics of a language necessitates a tagged or boxed representation for objectsLor 
if special hardware support for tags is availableLthen run-time type reconstruction is probably 
not the right choice. For exampleL compiler-directed type reconstruction is impossible in a 
dynamically-typed language such as Lisp because the language does not enforce sufficient static 
type restrictions on user programs in order for a compiler to gather all the necessary type 
information for later reconstruction. Maintaining tags on every object is the only way to ensure 
dynamic type consistency. SimilarlyLin the implementation of lazy languages such as Haskell 
[PJ92]Lall objects are boxed into closures to ensure lazy evaluation semantics. These closures 
can easily identify themselves and the object they contain via their code pointers. Independent 
type reconstruction does not provide any advantage in this situation. 

HoweverLfor the class of statically-typed languages that follow applicative-order evaluation! 1 



By applicative-order evaluation, we mean languages that evaluate function arguments before or in parallel 
with the invocation of the function. 



107 



type reconstruction enables substantial representational savings without sacrificing any run- 
time information. The object representations can be made clean and simple just like in C 
and Fortranrwithout compromising type consistency or the ability to use type information at 
run-time. Of courser we need to ensure that complete type reconstruction is possible for all 
run-time objects under all circumstances. Howeverrthe existing schemes [App89rGol9irGG92] 
do not guarantee complete type reconstruction for all run-time objects under all circumstances. 
In particularrpolymorphism and higher-order functions pose significant problems as discussed 
below. 

5.2.5 Polymorphism and Higher-order Functions 

Language features such as polymorphism and higher-order functions significantly complicate 
the problem of making exact type information available in a run-time system with untagged 
objects. Polymorphic functions are designed to be reusable with various types of data objectsr 
therefore no clue about the type of an object may be associated with the definition of such 
a function. The exact run-time type of a particular application of a polymorphic function is 
usually an instantiation of its static type and must be derived from the use of the function at 
that application site. The run-time system needs to compute such instantiations upon a type 
reconstruction request. 

Similarlyr higher-order functions take function closures as arguments and produce closures 
as results. These function closures may encapsulate hidden objects that are bound to the free 
identifiers of the function. UnfortunatelyFeven an exact instantiation of the type of a function 
closure may not reflect the types of the objects captured within its environment. Thereforerthe 
types of objects hidden within higher-order function closures may be impossible to reconstruct. 
We will examine some of these problems and their possible solutions in Chapter 6. 

5.2.6 Type Inference vs. Type Declaration 

Type inference is a convenient mechanism that frees the user from the task of declaring every 
identifier in the program with an appropriate type. Most modern programming languages such 
as Standard MLrHaskelirand Id use a systematic type inference system [Mil78]. Even languages 
favoring type declaration such as Pascal and C perform some ad hoctype inference in order to 
support automatic type coercions. 

Type reconstruction may be thought of as run-time type inference on the dynamic state of 
the computationralthoughra large amount of that information is pre-computed statically within 
the compiler. The use of type reconstruction at run-time is orthogonal to whether the compiler 
uses type inference or type declarations in order to collect the necessary static type information. 
Providing the type information within the program in the form of type declarations does not 
reduce the complexity of making that information available at run-time. The compiler still 
has the task of saving all the necessary information in the appropriate form and making sure 
that complete type reconstruction is possible for all objects at run-time due to the problems 
discussed above. 

5.3 Our Approach 

Id is a strongly and statically-typed language. Furthermorerit supports a polymorphic type 
inference system and uses an untagged run-time system. Our goal is to use run-time type 
reconstruction in order to determine the exact type of all objects within the Id run-time system. 



108 



As mentioned earlieiTthe existing schemes [App89rGol9irGG92] are unable to reconstruct the 
types of some objects. We would like to fix this situation so that the exact type of all run-time 
objects may be reconstructed automatically. 

Our proposed scheme lies somewhere in-between the two extremes of complete run-time 
tagging of objects (a la Lispr Standard ML) and carrying no type information at all (a la C) 
without compromising the goal of complete run-time type reconstructibility. We do not tag 
every run-time objectr although a small amount of explicit type information may have to be 
carried within some higher-orderr polymorphic functions in order to allow complete run-time 
type reconstruction. We analyze the user program at compile-time to detect such cases and 
insert the additional type information automatically. Essentiallyr our scheme can be viewed 
as compiler-directed explicit tagging for such run-time objects. We also provide a type re- 
construction algorithm and prove its correctness. The success of our scheme depends on the 
fact that the explicit type information needs to be inserted in very few cases that essentially 
plug the informational holes in the previous schemes and that it can be set up by the compiler 
automatically with little run-time support and overhead. 

The main contribution of this work is that we guarantee complete type reconstruction. As 
we will see in Chapter 7rour current system slightly restricts the acceptable set of type-correct 
programs in order to provide this guarantee. On the other handrthis guarantee opens the way 
for a universal framework for supporting various language and system applications that need to 
use exact object type information at run-time. We discuss some of these applications below. 

5.4 Applications of Complete Run-time Type Reconstruction 

5.4.1 Polymorphic Source Debugging 

A Source debugger for a statically-typedrpolymorphic language is an ideal application for run- 
time type reconstruction. In a debuggerrit may be necessary to display the values of any or 
all of the variables associated with a given procedure activation. Without any help from the 
run-time systemrthe static type signatures of polymorphic objects are usually insufficient to 
traverse and display their full contents. For exampler the append function on lists has the 
polymorphic static type Wo- (list to) — > (list to) — > (list to). The function may be used in 
various contexts to append various kinds of lists. In each caseFwe need to reconstruct the full 
run-time type of its arguments in order to display their contents appropriately to the user. 

Another interesting property of source debugging is that type reconstruction is required only 
for those objects (or function activation frames) that are requested by the user for displaying. 
The entire state of the machine need not be reconstructed at once. Moreoverrdebugging does 
not impose any serious performance constraints for type reconstruction. Users are generally 
willing to tolerate a reasonable cost for displaying an object which would now also include the 
cost of reconstructing its type. 

5.4.2 Tagless Garbage Collection 

Type reconstruction may also be used within a run-time system in order to perform garbage 
collection without maintaining any type information on the heap objects themselves. Ab- 
stractlyra garbage collector performs two functions: it distinguishes live objects from those 
that are garbage (live-object detection)T and it reclaims the storage allocated to objects that 
are garbage (dead-object reclamation) . For live-object detectionrthe garbage collector must be 



109 



able to distinguish scalar objects from heap- allocated objects and determine their sizes (object 
identification) . The actual type of an object is very useful for this purpose. 

Conventional techniques for object identification operate with a very simple memory model 
and make little or no use of language and compiler-specific information. Pointers may be tagged 
using one bit to distinguish them from scalars values and objects may be provided with header 
tags or may be allocated in separate areas of memory to keep track of their size. The reader is 
referred to a recent such techniques in [Wil92]. 

Unlike source debuggingr garbage collection does not require complete source type infor- 
mation per sayrbut additional type information may be helpful in optimizing the marking of 
live objects. For instancerit may be possible to entirely skip the traversal of large arrays while 
searching for embedded pointers to heap objectsrif the exact run-time type of their elements 
turns out to be a scalar. Clever compilers and run-time systems that tag every object [App90] 
may sometimes be able to encode such information within the header of the array if its type 
is statically known to be a scalarrbut this is not possible with polymorphic array construc- 
tors such as the make_vector function of Example 2.1 which could be used in both scalar and 
structured array computations. 

An alternative solution for object identification is to use complete run-time type reconstruc- 
tion. This technique enables garbage collection to be performed in an untagged run-time systemr 
saving valuable application time and space spent in continuous tag maintenance. Complete type 
information also paves the way to type-based optimizations in marking flat data-structures as 
discussed above. Butrone has to weigh these advantages against the cost of performing type 
reconstruction whenever garbage collection is requested. 

As an examplera simple "mark-and-sweep" tagless garbage collector would work as follows. 
When garbage collection is initiatedrthe first step would be to reconstruct the types of the 
root set of heap objects that are either stored in global variables or pointed at from within the 
function activation frames. The reconstructed type information would then be used to guide 
the garbage collector in identifying and traversing the reachable heap objects and marking them 
as live. Finallyr unmarked objects would be reclaimed as garbage. We describe such a scheme 
in Chapter 8. 

5.4.3 Object-based I/O 

Another application that may benefit from run-time type information is I/O. Most program- 
ming languages offer either stream-based or continuation-passing I/O primitives for a few basic 
datatypes that may used to build more complex read/write functions explicitly (e.g.TCrPascair 
Haskell). Typicallyn/0 formats and styles for complex objects are directly controlled by the 
user. Polymorphic objects are handled using explicitly parameterized I/O routines. With run- 
time availability of type informationn/O handling for complex (even polymorphic) objects can 
be made automatic. The structure of an object may be directly determined from its type. For 
fixed sized objectsrthe size of the object may also be ascertained from its type. For dynamically 
sized arraysrthe size information may be kept within the object itself. Given this informationl" 
an entire complex object may be read or written easily using its type to select and guide the 
output format. 

The run-time systems of dynamically-typed polymorphic languages such as Lisp or Smalltalk 
usually offer such I/O capability automatically for each user-defined data-structure within the 
program. This is possible because all objects in such languages carry type-tags which may be 
used to guide the generic I/O functions according to the structure of that object. With type 
reconstruction!" this capability may also be provided in a statically-typed languages with an 



110 



untagged object model. MoreoveiTjust like tagless garbage collectionrit may also be possible 
to generate object-based I/O routines that are specialized to a given object type and hence are 
more efficient than generic I/O routines that interpret the reconstructed type at run-time. 

Another possible use of complete run-time type reconstruction and object-based I/O is 
in periodic check-pointing of the entire machine state for long-running programs. Complete 
type reconstruction would enable traversal and recording of all the dynamic data-structures 
participating in the computation including the activation stackrthe global environment!" and 
all the accessible objects residing on the heap. 

5.5 Outline 

In the rest of Part II we study the problem of complete run-time type reconstruction for Id 
programs in detail and describe some of its applications implemented within the Id run-time 
system. In Chapter 6rwe intuitively analyze the problem of polymorphic type reconstruction 
by means of examplesIMescribe the compiler and run-time system support required and outline 
a reconstruction algorithm. Chapter 7 formalizes these ideas in the context of the Kernel Id 
intermediate languagel" presents a complete reconstruction algorithmFand proves its correct- 
ness. Finallyrin Chapter 8 we present tagless garbage collection as an application of complete 
type reconstruction and compare its performance with a conservative garbage collector and a 
compiler-directed explicit allocation/deallocation scheme. 



Ill 



112 



Chapter 6 

Compiler-directed Polymorphic 
Type Reconstruction 



In this chapteiTwe informally present the problem of complete run-time type reconstruction for 
higher-orderrpolymorphic languages such as Id and discuss some of its solutions. In Section 6. IT 
we briefly describe the problem via examples and discuss why the existing approaches are 
insufficient to guarantee complete run-time type reconstruction. In Section 6.2rwe provide the 
basic framework for doing complete type reconstructionFcharacterizing the analysis required at 
compile-time and the reconstruction strategy to be followed at run-time. Nextrin Section 6.3 
we present a compilation scheme that identifies and inserts the necessary type information 
within the user program to guarantee complete type reconstruction at run-time. Subsequentlyl" 
Section 6.4 walks through a reconstruction example. In Section 6.5IVe show a series of compiler 
optimizations and variations on our compilation scheme that may further reduce the book- 
keeping overhead of the current scheme. FinallyOn Section 6.6 we point to two implementations 
of our type reconstruction strategy. 

6.1 Type Reconstruction Problem 

The problem of type reconstruction for Id can be described as follows. At some point during the 
execution of a programFwe wish to take a snapshot of the state of the machine and determine 
the type of every object accessible within the computation. We assume that the program is 
typed statically and that the run-time environment does not maintain any type information 
implicitly. In particularlTd run-time objects do not carry any type-tags. 

Clearlyronly polymorphic objects and functions pose some challenge; complete type infor- 
mation can be obtained at compile-time for monomorphic objects. Also note that the exact 
nature of the desired information depends on the application that uses it. For exampler a 
source debugger may wish to inspect any particular object from the current run-time state of 
the machine whereas a garbage collector only needs to traverse those that are still in use. Alsor 
most garbage collectors only need to differentiate between scalars and pointers to structures 
while a source debugger needs exact type information in order to display the object properly. 
In generairwe would like to devise a flexible strategy that can be optimized according to the 
level of information desired. 



113 



6.1.1 Basic Type Reconstruction Scheme 

Usuallyr the compile-time type of an object is a good starting point for the reconstruction 
of its run-time type. In case of polymorphic functionsr the types of the objects contained 
within the function body would depend on the types of the arguments that it receives at a 
given application site. Appel [App89] first noted that if the exact types of the arguments of 
a polymorphic function were known at run-timer then its entire body could be instantiated 
appropriately using its compile-time typing. The exact types of the arguments present at an 
application site mayrin turnrbe determined by reconstructing the type of the parent's body 
containing that application site and so on. 

Goldberg [Gol91] made the above ideas more concrete in the context of tagless garbage 
collection for strongly-typedr sequential languages. Although his scheme applied specialized 
garbage collection routines to heap objects directly without explicitly reconstructing their typesr 
the basic mechanism of type reconstruction remains the same and may be described as follows 
in the context of parallel program execution: 

Compile-time support: 

1. The program is type-checked completely. 

2. For each user-defined function within the programrthe types of all its arguments and 
the types of its local and free variables are recorded in a type-map. This type-map 
serves as a static template for the function's run-time activation frame. 

3. For each function application siterthe full static type instantiation of the function 
being applied is also recorded within the type-map of the enclosing function defini- 
tion. 

Program invocation and execution: 

1. The top-level expression is type-checked and the types of its command-line arguments 
are recorded. 

2. The top-level expression is now executedr expanding the run-time state of the ma- 
chine into a tree of activation frames (a stack of activation frames in a sequential 
language) . Each function application evaluates in the context of its own activation 
frame which stores its actual arguments and saves the values of temporary local 
computations. 

3. The machine may be halted at any point during execution and type reconstruc- 
tion may be requested for a particular frame present within the current dynamic 
activation tree (the activation stack in a sequential language). 

Run-time type reconstruction: 

1. Firstrthe function corresponding to the current activation frame is identified and its 
static type-map is obtained. 

2. If the current function is not polymorphic then no type reconstruction is required. 
Otherwiserits parent activation frame and application site are identified using the 
return address information in the current frame. 

3. If the parent activation frame is the root of the dynamic activation tree then the 
exact types of the arguments supplied to the current function are already known. 
Otherwiserthe process of type reconstruction is repeated for the parent frame by 
going back to Step 1. 



114 



(list int) 



root | 



Activation Tree 



Heap 



map 



enlist 



to 



enlist \. 




map 
closure 



enlist 
closure 




enlist \ 



Type Reconstruction: W = t1 = int 



Figure 6.1: The Run-time State of Computation in Example 6.1. 

4. Given the exact types of the arguments of a functionrits static type-map is fully 
instantiated by matching the actual types of the arguments to their static types. 
This reconstructs the current activation frame and also provides the exact types of 
the arguments present at any application sites within the body of that function. 

As shown aboverthe reconstruction process may continue possibly up to the root of the 
activation tree where the run-time types of the user-supplied arguments are available. At that 
pointrall polymorphic functions in the call chain can be correctly instantiated revealing the 
run-time types of their internal objects. In the context of sequential executionrGoldberg [Gol91] 
also showed that the entire state of the machine may be reconstructed in one pass by starting 
from the root frame at the bottom of the activation stack and working towards the most recent 
frame at the top of the activation stack. 

We illustrate the above reconstruction scheme with a small example: 1 

Example 6.1: 

def enlist x io = x:nil; 
def map f nil = nil 
I map f (y:ys)(, !sifl) = (f y fl ):(map f ys) ; 

map enlist (1 :2:nil) (;!simi) ; 



'AH the examples in this chapter use the Id language syntax [Nik91]. Briefly, functions are introduced with 
a def keyword and allow pattern-matching on their arguments. (:) is the infix cons operation. 



115 



The function enlist has a static type Vio-io —> {list to) and map has a static type Vii^2-(^i - > 
£2) —> {list t\) —7- (fcf £2)- We also show the type instances of some internal identifiers as 
subscripts. The evaluation of the top-level expression (map enlist (l:2:nil)) dynamically 
unfolds into a tree of activation frames as shown in Figure 6.1. 

If we wish to examine the x argument of enlist during one of these callsrthen the run-time 
instantiation of its static type to may be determined by following up the dynamic chain of 
activation frames into its application site within the map function. Herer^o may be related to 
the static type t\ of the actual argument y at that application site. This relates to the type of 
the second argument {list t\) of map which is found to be {list int) at the root application site. 
Thenrboth t\ and to can be instantiated to int giving the actual type of x as desired. 

6.1.2 Problems with Closures and Free Variables 

Unfortunatelyrthe above scheme is incomplete. Goldberg and Gloger [GG92] noted that some- 
times types of objects hidden inside a closure are impossible to reconstruct. Consider the 
following example: 

Example 6.2: 

def f2 x fo y tl = y; 

g2 = if ... then f2 l mt else f2 "foo" straj ; 

g2 2; 

Hererf2 has a type Vio^i-io ~~ ^ t\ — > iiTand therefore g2 gets bound to a partially applied 
function closure with type \H2.t2 — > £2 that says nothing about the type of the data hidden inside 
it. In factrthis type cannot be determined at compile-time because it depends on the value of 
the predicate (...). Besidesrduring the evaluation of (g2 2) the return address information on 
the call stack would point to the application site of g2rwhich does not help in determining the 
contents of that closure either. ThusFwe cannot reconstruct the type of the argument x within 
the activation of f 2 because the computation that created its closure is no longer available as 
part of the dynamic activation tree. 

It may appear that this problem arises only when an argument of a function is never used 
within its bodyrbut the following example adapted from [GG92] shows that this is not the 
case: 2 

Example 6.3: 

def f3 X(, !sifo ) = 

{ def h3 z tl = if length x (fcjifo) == 1 
then z:nil 
else z:z:nil; 
in h3 } ; 
g3 = if ... 

then f3 (1 :nil) (;!simi) 
else f3 (true mil) ( Usthoo i); 



g3 2 



mt > 



Hererthe type of the function f3 is \/tot\.{list to) — > t\ — > {list £i)Tand therefore the type of 
the computed closure g3 is \H2.t2 — > {list t^). During the evaluation of the application (g3 2)T 



2 In Id syntax, a block-expression (bounded by {}) encloses a set of identifier bindings. The result of such a 
block is the value of the expression following the keyword in evaluated within the scope of the bindings. 



116 



no information is available in the activation tree whether this closure contains a list of booleans 
or a list of integers. Goldberg and Gloger argue in [GG92] that since h3 does not use the 
elements of its free variable list x but only its spine (to compute its length) Ta garbage collector 
can ignore these elements and copy just the spine. But this approach creates problems if these 
structures were shared in many places and is quite unsatisfactory for a source debugger that 
needs to display the full object. 

The problem of not being able to reconstruct the exact type of an object as shown above 
does not appear all the time. For instancerthe type of argument z within h3 in the above 
example may be reconstructed to the type int by traversing up the call stack to its application 
site (g3 2). In factrfunctions like map in Example 6.1 never have this problem: 

Example 6.4: 

g4 = (map 

enlist) (u s t t )->(iist (list t )) '> 
g4 (l:2:nil)(, !simi ); 

Even though here map is partially applied to enlist to yield a closure g4 with type 
\/to.(list to) —7- (list (list to))Twe have not lost any type information. Instantiation of to to 
int at the call site of g4 yields complete type information about all the internal identifiers of 
both map and enlist. The problem with Examples 6.2 and 6.3 is that sometimes the types 
of closures do not have any connection with the types of objects hidden inside them. In such 
casesrwe are in danger of losing type reconstruction information because the closure creation 
site may no longer be available on the call stack. 

Another interesting point is that polymorphic objects with universally quantified types do 
not pose this problem. The run-time type of such an object cannot be more specific than its 
compile-time definition type. For instancerin the following example the variable x within the 
body of f 5 has the universally quantified type \/to.(list to). 

Example 6.5: 

def f5 y = 
{ x = nil; 

def h5 z tl = if length x (Hsfto ) == 1 
then z:nil 
else z:z:nil; 
in h5 } ; 

Nowr there is no question about the contents of the closure formed by h5 over its free 
variable x. It can never contain an object whose type is more specific than \/to.(list to). For 
our purposesrthis means that the compile-time type of a polymorphic object provides sufficient 
information for its run-time type reconstruction. 

6.1.3 Discussion 

The examples presented above attempt to provide an intuitive understanding of the process of 
type reconstruction. It appears that for some polymorphic functions we are able to infer type 
reconstruction information from the parent-child relationships embedded in the activation tree 
while for others we need additional information at run-time for complete type reconstruction. 
Now we can characterize the problem of type reconstruction more concretely: 

1. Firstrwe need to identify and record all the compile-time type information necessary for 
type reconstruction. We also need a criterion to identify what additional type informationr 



117 



if anyr needs to be carried at run-time for complete type reconstruction of polymorphic 
functions (Section 6.2). 

2. Nextrwe need a compilation scheme that transforms the given program into one that 
generates and propagates the additional type information (Section 6.3). 

3. Finallyrwe need a type reconstruction algorithm that uses the explicit and implicit type 
information at run-time and reconstructs the exact type of all run-time objects (Sec- 
tion 6.4). 

6.2 Type Reconstruction Framework 

In this sectionr we discuss the general framework for run-time type reconstruction. Firstr 
we describe the run-time execution model of Id programs. Using this modeirwe formulate 
a strategy for reconstructing the complete run-time machine state. Finallyl" we identify the 
essential information that needs to be recorded at compile-time and establish a type conservation 
criterion that guarantees complete run-time type reconstruction. 

6.2.1 Run-time Model of Program Execution 

Id is a non-strictr implicitly parallel language with an eager evaluation strategy. BelowFwe 
summarize the execution model of a Kernel Id program. 

A program in Kernel Id consists of an expression query to be evaluated within the scope of 
a set of top-level value bindings and type declarations. TypicallyTthis evaluation is carried out 
in several phases as described below: 

Compile-time — Firstrthe top-level bindings and type declarations are type-checked giving 
rise to the global static environment. This environment records the exact types of all 
global identifiers. SubsequentlyFall top-level value bindingsr datatype constructorsFand 
internal function definitions are compiled into independent code-blocks. 

Link/Load-time — All code-blocks are loaded and linked into the program memory giving 
rise to the global dynamic environment. 

Invocation-time — The top-level expression query is type checked in the global static en- 
vironment and then compiled into a root code-block. At this pointr exact types for all 
local and free identifiers used in the query expression are known. The global static and 
dynamic environments together with the typed root code-block for the query expression 
constitute the complete initial state of the machine. 

Run-time — A code-block always executes in the context of an activation frame which 
records the actual arguments bound to its formal parametersrthe run-time objects bound 
to its free identifiersr and the values of all its local identifiers during execution. An 
activation frame is allocated at the time of a function application and it is deallocated 
when that function terminates. In a sequential systemFan activation frame corresponds 
to the stack frame of the currently executing function. In the parallel execution model of 
Idrthe run-time stack generalizes to a tree of activation frames as shown in Figure 6.2. 

The program starts execution by allocating an activation frame for the root code-block 
recording its actual arguments and local identifiers. Subsequent function invocations 
extend the dynamic activation tree with their own activation framesFexecuting in parallel 



118 



Tree of Activation Frames 

(Spread across Computation Nodes) 

(root) f 



Global Heap of Shared Objects 

(Spread across Memory Nodes) 




n, 



Y\ 



active threads 



Figure 6.2: The Parallel Execution Model for Id. 



with their parent activation. Shared objects are allocated on a separate global heap 
and are accessible via pointers from the activation frame (see Figure 6.2). ThusFat any 
time during executionrthe complete run-time state of the machine consists of the global 
dynamic environmentrthe tree of active or suspended activation framesFand all the heap 
objects accessible through the global identifiers or the activation frames. This is the state 
of the machine we are interested in reconstructing. 

6.2.2 Type Reconstructibility 

Starting from the initial state of the machine as described aboveFwe can view type recon- 
structibility as an invariant condition to be maintained at each subsequent evaluation step 
that modifies the run-time state of the machine. We identify two kinds of state modifications: 
m£ra-procedurairand inter-procedural. 

The m£ra-procedural modifications to the state of the machine are due to the computation 
within a code-block: accessing values of function parameters and free identifiers to compute 
local valuesFallocating heap objectsFmodifying global or heap objects etc. Since our language 
has a sound type systemrtype-correct programs are guaranteed not to produce run-time type- 
errors or to compute values that are type-inconsistent. This implies that any value bound to 
an identifier in a given code-block must be consistent with the exact type of that identifierr 
otherwise it could lead to a run-time type-error. This is true even for identifiers bound to 
mutable objects. In other wordsrthe actual values of mutable identifiers and heap objects could 
change due to side-effectsr but the types of those values would remain the same. Thereforer 
once the exact types of all identifiers present within a code-block are determinedrthey serve to 



119 



identify the exact types of all the values computed and the heap objects allocated within the 
code-block over its entire life-time. 

The inter-procedural modifications to the state of the machine take place at a function 
application or return. A function application introduces a new activation frame that binds a 
new set of local identifiers and points to the heap objects allocated within the function. We need 
to ensure that the exact types of these new local identifiers and heap objects are reconstructible 
on the basis of the existing state of the machine before the function application. 

The above discussion suggests that an activation frame is an appropriate unit of type recon- 
struction. The entire state of the machine may be reconstructed by induction on the structure 
of the dynamic activation tree. As the base steprthe exact types of all objects in the root 
activation frame are already known at the start of the program. The inductive step is to ensure 
that at every function application site that expands the dynamic activation treerthe type of 
every slot in its activation frame can be identified and correctly instantiated. Below rwe analyze 
the compile-time information required to achieve this. 

6.2.3 Recording Compile-time Type Information 

In Section 6.1.irwe informally introduced the concept of the type-map of a function that was 
used as a static template during its type reconstruction. BelowFwe make that definition more 
concrete: 

Definition 6.1 (Type-map) Given a function f = Xx\ ■■ -x n .E with free identifiers {z\ ■■■ z m } 
J-(Xxi ■ ■ -x n .E) and locally bound identifiers {y\ ■ ■ -yf\ = B(E), its type- map denoted by TMf 
records the following information: 

1. The function type, f : T\ — > ■ ■ ■ T n — > T n+ i . 

2. The types of all the function parameters x\ : T\, . . . , x n : r n . 

3. The type-schemes of all the free identifiers of the function, z\ : a Zl , . . . , z m : a Zm . 
4- The type-schemes of all the locally bound identifiers y\ : a yi , . . . ,yi : a yi . 

5. The type-instance of the function identifier g at all application sites (g a\ ■ ■ ■ a^) within 
the function body E. We also record whether an application site has been statically deter- 
mined to be a full-arity application site. 

A type-map records the static types of all the parametersrthe free identifiersFand the local 
identifiers of a code-block along with some additional type information about its internal call 
sites. It is essentially a mapping from the frame slots of a code-block's activation frame to 
their static types. The type-map TM f is parameterized by the set of all its free type-variables 
J- (TMf). This set exactly captures the missing information in the static type environment of 
a function that needs to be instantiated at run-time. 

We generate static type-maps for all code-blocks within the program at compile-time. These 
templates are then linked together with the compiled object code and may be accessed at 
run-time using the name of the code-block. As an examplel" Figure 6.3 shows the Kernel Id 
translation and the type-map for the map function of Example 6.1. 

6.2.4 The Principle of Type Conservation 

Consider a first-order application site for a function that does not have any free identifiers. We 
can reconstruct the types of all objects in its activation frame using the basic type reconstruction 
scheme described in Section 6.1.1. We assume that the name of the callee function can be 
identified from its current activation frame which also identifies its static type-map. The return 



120 



def map f 1 = 
{ p = nil? 1; 

11 = if p then nil 
else 
{ x = hd 1; 
xs = tl 1; 
y = f x; 
ys = map f xs; 
12 = y : ys; 
in 12}; 
in 11}; 



Typemap 


W,t1 


map 


(W -> t1) -> (list W) -> (list t1) 


f 


(W -> t1) Arguments 


1 


(list W) 


P 


boot Local Frame Slots 


11 


(list t1) 


X 


to 


xs 


(list W) 


y 


t1 


ys 


(list t1) 


12 


(list t1) 


f 


ftO -> t1) Internal Call Sites 


map 


(W -> tl) -> (list W) -> (list tl) 



Figure 6.3: Kernel Id definition and the Type-map of map function. 

address information stored within the frame identifies the caller's activation frame and the exact 
application site within the caller's body that gave rise to the call. Assuming that the caller's 
frame has been reconstructed recursivelyr the exact type instantiation of the callee function 
recorded within the type-map of the caller (Item 5 in Definition 6.1) provides the exact types 
of all the arguments passed to the callee at this application site. Nowrthe callee function's 
type-map may be instantiated by matching the types of the actual arguments with the types 
of the parameters recorded in the callee's type-map. 

Unfortunatelyrnot all application sites are first-orderFsince our language allows higher-order 
functions and partial applications (currying). As shown in Figure 6.4Fpartial applications create 
function closures that simply record the supplied argument in a closure data-structure instead 
of creating a new activation frame right away. The type of such closures may not provide 
sufficient information regarding the type of the arguments captured within the closure (e.g.T 
closure g2 of Example 6.2). Some functions refer to free identifiers that must also be recorded 
in a closure at the point of their definition 3 (e.gTfunction h3 of Example 6.3). The types of 
such free identifiers may not be reflected in the overall type of the function closure either. 

In a higher-order language such as IdlTunction closures are first-class objectsri.e.rthey may 
be stored into heap data-structuresFpassed as arguments to other functionsFand returned as 
values from the function that created them. Thereforerthe function definition site or the partial 
application site that creates a closure is not guaranteed to be accessible when that closure is 
used in further computation. As shown in Figure 6.4Fsuch application sites are termed as 



Lambda-lifting transformation [Joh85] may be used to lift nested functions with free identifiers into top-level 
super- combinators that refer to only top-level identifiers. But, this transformation restricts the type polymor- 
phism of free identifiers and does nothing to change a higher-order program into a first-order program. Therefore, 
we choose to deal with the problem of free identifiers directly. 



121 



Closure 

(Function 




Invisible Partial Applications 



Closure 

Function 




Parent 



Visible Final Application 



m 



^ 




Activation Tree 



Function 



Figure 6.4: Visible and Invisible Application Sites. 



invisible. A function closure expands into an activation frame only when all its arguments have 
been accumulated. This final application site of the closure is termed as the visible application 
site because its position may be determined by examining the return address stored within the 
expanded activation frame. 

The type reconstruction scheme outlined above for first-order function applications would 
work with higher-order function closures only if the closure type instantiation recorded at the 
final application site has sufficient type information to instantiate the types of all the free 
identifiers and previous arguments accumulated within the closure. Such a function is called 
type-conserving . This is a static property of a function's type signature and is characterized in 
the definition given below. On the other handHf a function does not satisfy the above propertyr 
then some type information may be lost at its definition site or its invisible partial application 
sites. We also identify such information in the following definition for each of the invisible 
application sites: 

Definition 6.2 (Type Conservation) Given a function f with arity k, type-map TMf, and 
type-scheme Vcui . . .a n .T\ — > ■ ■ -Tk — > Tk+i, 



1. The type-variables T(TM j)\T(t\ 
at the function definition site. 



■•■Tk 



Tk+i) are defined as not being conserved 



122 



2. The type-variables J-{ri) \ J^(r 8 _|_i — > ■ ■ -r^ — > Tk+i) (1 < i < k) are defined as not being 
conserved at its i-th application site. 

3. The type-variables T{jk — > Tfc+i) are defined as being conserved at the final (k-th) 
application site. 

4- The function f is said to be type-conserving if all the type-variables in its type-map are 
being conserved, i.e., T{TM j) = T{jk — > r^+i). 

Informallyra type-conserving function can correctly instantiate its entire type-map with just 
the run-time type of its final application closure. It is easy to check that map and enlist from 
Example 6.1 are type-conservingrwhile f 2 from Example 6.2rand f 3 and h3 from Example 6.3 
are notrwhich is why we were losing type information in those cases. 

Definition 6.2 may be used by a compiler to detect functions that are not type-conserving. 
Furthermorerthe definition shows exactly how much type information is lost at each application 
site. The next question is what type reconstruction strategy should be devised for such func- 
tions? Our scheme is to make every function closure self-sufftcientTwhich means that a closure 
for a non-type-conserving function must carry exact run-time encodings of its non-conserved 
types. We describe our compilation scheme in the next section. 

6.3 Compiler Support for Type Reconstruction 

In this sectionrwe informally describe a compilation scheme that analyzes every function in 
the program and transforms it to generate and propagate exact run-time type instantiations of 
its non-conserved type-variables where necessary. These encoded type-hints are inserted at the 
partial application sites that otherwise do not preserve this information and are deposited into 
the function's activation frame at the time of its final application. These type-hints may then 
be used to reconstruct the exact type instantiations of the non-conserved type- variables for the 
current activation frame of the function. 

It is interesting to note that the propagation of type information from closure creation 
sites to their final application sites for non-type-conserving functions may be formulated as 
an overloading resolution problem which may then be handled using well-known techniques in 
the literature [Gup90rPJW92rWB89]. These techniques systematically translate overloading 
into parametric polymorphism by replacing unresolved instances of overloaded identifiers in a 
function with additional parameters that are supplied at its application site. In our schemer 
these parameters are the explicit type-hints that are used by the type reconstruction algorithm. 

Belowrwe intuitively describe our compilation strategy by means of examples. We also 
provide a simplified but self-contained description of overloading resolution and translation 
mechanism as applied to type reconstruction. The full details of this transformation and the 
subsequent reconstruction process appear in Chapter 7. 

6.3.1 Detecting Violations of Type Conservation 

The first step in our compilation process is to identify the functions in the program that 
may require additional type-hints for the non-conserved type-variables in their type-map. This 
is straightforward given the test for type conservation in Definition 6.2. Firstrwe type-check 
each function / in the program and generate its type-map TM f according to Definition 6.1. 
Thenfusing Definition 6.2 we determine which type-variables in its type-mapfif anyfare not 



123 



def h3 z = 

{ yO = length x; 

yl = (==) y 1; 

y2 = if yl then 

{ y3 = z:nil; 

in y3 } 
else 

{ y4 = z:nil; 
y5 = z:y4; 
in y5 } ; 
in y2}; 



Typemap 


W,t1 


h3 


t1 -> (list t1) 


z 


t1 


X 

length 


(list W) 

Vt2. (list t2) -> int 


yO 

yi 
y2 
y3 

y4 
y5 


int 
bool 
(list t1) 
(list t1) 
(list t1) 
(list t1) 


length 


(list W) -> int 



Type Parameters 

Fn. Signature 
Fn. Arguments 

Free Identifiers 
Local Frame Slots 



Internal Call Sites 



Figure 6.5: The Kernel Id definition and type-map of function h3 from Example 6.3. 



being conserved. For examplerthe type-map for function h3 from Example 6.3 is shown in 
Figure 6.5. Its type signature is Vii.ii — > (list t\). Comparing these two together we get!" 



Hh 



^(TM h3 ) 

-> (list ti)) 



{toM} 



Thereforer the type- variable to is not being conserved in the function h3 and it requires a 
run-time type-hint for proper type reconstruction. 

6.3.2 Propagating Non-Conserved Type Information across Functions 

In generairadditional type-hints may need to be propagated within the body of a function not 
only to reconstruct its own non-conserved type- variables but to pass them on to other functions 
within its body that require those type-hints. AlsoFsome of the non-conserved type-variables 
at these internal application sites may get partially or completely instantiated. We need to 
record these instantiations so that appropriate type-hints may be generated at those sites. 

Both the above problems may be addressed by viewing the reconstruction of the non- 
conserved type- variables as an overloaded operation tree? that must be resolved within the body 
of the given function. Standard overloading resolution mechanism picks up such unresolved 
overloaded identifiers and arranges the required information to be passed in as a parameter to 
the function. Subsequent uses of the function ensure that the additional information can be 
instantiated from the enclosing environmentrthereby propagating the requirement outwardsFif 
necessary. We illustrate this process for the function h3 of Example 6.3: 

Example 6.6: 

def f3( irec?fo ) X(j isfto ) = 

{ def h3 (irec?fo) z tl = if length x (fc ,„ o) == 1 



124 



then z:nil 
else z:z:nil; 
in h3( irec?fo ) }; 
g3 = if ... 

then f3( irec?mi ) (l:nil)(, isfinf ) 
else f2> (t r ec? boot) (t rue:nil )(fo<i 00 ;); 

g3 2int> 

Hererwe have added a predicate 4 (tree? to) as an annotation on the function h3. In generair 
a predicate is added to a function's type signature for every non-conserved type- variable in 
its type-map at the precise argument position where that information is being lost according 
to Definition 6.2. Subsequentlyrthe standard overloading resolution mechanism automatically 
propagates this predicate to the place where h3 is referenced and to the enclosing lexical function 
f 3 because it remains uninstantiated (and hence unresolved) in its body. Finallyrthis predicate 
propagates to the application sites of f 3 where it is completely instantiated according to the 
types of the arguments being supplied to f 3 and is considered to be resolved. 

Intuitivelyrthe propagation of a predicate associated with a function represents a lack of 
type information locally which must be supplied from the application site where this predicate is 
instantiated. Note that the predicate need not provide the full type of the argument or the free 
identifier of the function that requires such information (e.g.Tthe identifier x in Example 6.6). 
It only identifies the instantiations of the non-conserved type-variables present in that type. 
This is sufficient to fully instantiate the type stored in the function's type-map corresponding 
to that identifier. This scheme allows us to share the type instantiations of the non-conserved 
type-variables across several identifier types that contain that type-variable. Thusrthe number 
of external type instantiations needed by a function is limited by the number of non-conserved 
type-variables and not by the number of its actual parameters or free identifiers present in its 
type-map. 

Another interesting observation is that predicate instantiations involving polymorphic type- 
variables are always considered as resolved and are not propagated outwards in the light of the 
discussion in Section 6.1.2. For instanceFg3 in the above example might have been defined as: 

Example 6.7: 

g3 = if ... 

then f3( trec ? (u s t t )) (nil mil) 
else f3( trec? boo i) (truemil); 

Herer (tree? (list t)) is an instantiation of f 3's predicate according to its polymorphic 
argument (nilrnil). Even though this predicate has an uninstantiated type-variable irit is 
not propagated any further because it is polymorphic at this point. It follows immediately that 
there can be no unresolved predicates at the top-level because there are no free type-variables 
in the top-level type environment by construction. 

6.3.3 Program Translation 

The final step in our compilation process is to add extra hint parameters to the function 
definitions that have non-conserved type- variable predicates of the form (tree? t). Likewisel" 



We follow the terminology of [Gup90, WB89] where the usual Hindley/Milner type of a function is extended 
with predicates to model overloaded identifiers. In Haskell [HWe90] these are known as contexts. The predicate 
name tree? in our scheme stands for type-reconstructible? . 



125 



a predicate (tree? r) appearing at a function application site is transformed into a type-hint 
encoding r that is passed as an explicit argument at that application site. 

It is possible to either add one hint parameter for each non-conserved type-variable or 
group the hints together in a single hint-record from which the individual hints may be fetched. 
Our current scheme adds one hint parameter per type-variable at the position specified by 
Definition 6.2. This is because passing a small number of additional parameters is currently 
cheaper in our system than allocating and fetching from heap data-structures. 

The compiler keeps a record of the mapping between the non-conserved type-variables of 
each function and its additional hint parameters. This mappingralso called the hint-mapT is 
shown below: 

Definition 6.3 (Hint-map) Given a function f = Xx\ ■ ■ ■ x n .E with non-conserved type- 
variables p = {«i, . . . , a m }, its hint-map is the mapping HM f = {(«i i— > y{), . . . , (a m \— > y m )}, 
where y\ , . . . , y m are its new additional hint parameters. 

As an examplerbelow we show the hint-map for the function h3 from Example 6.6: 



Type Variable 


Hint Parameter 


*o 


h3_hint_l 



The actual type-hints may now be generated using an encoding of the type constructors 
and their type arguments. The encoding should permit type-hint construction and propagation 
from within the user program. Although not necessaryFwe may view the encoding as an Id 
datatype as shown below: 

type typeJiint = none | tc string (list typeJiint) ; 

The disjunct none encodes polymorphic type- variables that do not require any hint. The 
disjunct tc encodes a type-constructor by its name and a list of encoded type-parameters. The 
free type-variables of a type-hint r are encoded using the corresponding additional parameters 
of the enclosing function definition recorded in its hint-map. 

Continuing with Example 6.6 aboverthe following translation is obtained: 

Example 6.8: 

def f3 f3_hint_l x = 

{ def h3 h3_hint_l z = if length x == 1 

then z:nil 
else z:z:nil; 
in h3 f3_hint_l }; 
g3 = if ... 

then f3 (tc "int" nil) (l:nil) 
else f3 (tc "bool" nil) (truernil); 
g3 2; 

Notice how the hints generated within g3 propagate into h3 via the hint parameters of f 3 
and h3. The appropriate hint will now be available in a dynamic activation of h3 where it may 
be used along with its type-map to reconstruct the exact run-time type of x. 

6.4 Run-time Type Reconstruction 

Nowrwe have all the necessary information to reconstruct the entire run-time state of the 
machine. As discussed earlier in Section 6.2.1Tthe global dynamic environment and the tree of 



126 



Activation Tree 



Heap 



i J3 i >* 
[" fYhinri 

X 

■{h3f3^hinO) 



root | 


13 


..* 93 


(932) v 



t1 

(list W) 



Type Reconstruction: 

t1 = int 

W = Decode[[ h3_hint_ 1 ]] = bool 



h3 closure 



h3 hint 1 




h3 | 




h3_hint_1 - 




z 2 








X 




... 





tc 










" 





"bool" 



true 





Figure 6.6: The Run-time State of Computation in Example 6.8. 



activation frames constitute the root set of the run-time state of the machine. All the relevant 
heap objects may be accessed through this root set. The types of the global identifiers are 
already available in the global static environment. ThereforeFwe only need to reconstruct the 
types of all the activation frames in order to obtain the types of all the objects in the root 
set. The type of any accessible heap object may then be reconstructed by examining the fully 
instantiated type of an appropriate pointer within the root set that leads to the given heap 
object. 

The detailed algorithm for complete type reconstruction of an activation frame will be 
presented in Chapter 7. Herer we describe a type reconstruction example to illustrate the 
modifications to the basic scheme presented in Section 6.1.1. These modifications use the type- 
hints inserted by the compilation scheme of Section 6.3 to account for the type information 
that is otherwise lost. 

6.4.1 A Type Reconstruction Example 

Figure 6.6 shows a snapshot of the state of the machine during the execution of translated 
Example 6.8. Let us suppose that the predicate (...) in the definition of g3 evaluates to false 
at run-time. The computation of g3 expands into an activation frame for f 3r returning a 
closure for the function h3 with the appropriate type-hint and the second argument hidden 
inside. We assume that this computation has terminated and the activation frame for f 3 
has been deallocated (shown with dotted lines in Figure 6.6) so that there is no trace of the 
application site where g3 was constructed. The evaluation of the application (g3 2) unfolds 
the computation into an activation frame for h3 as shown in Figure 6.6. Let us also suppose 
that the program is halted when h3 has just been invoked. The problem is to reconstruct the 
types of the objects in h3. 

The type-map of the function h3 given in Figure 6.5 shows that it needs the exact type 



127 



instantiations of the type- variables to and t\ for proper type reconstruction. From the hint- 
map given in Section 6.3.3rwe know that the additional parameter h3_hint_l encodes the exact 
type instantiation for the type-variable to which is decoded to produce the type bool. The type 
of the free identifier x within h3 may now be reconstructed to be (list bool) as given by its 
type-map. The remaining type-variable t\ is instantiated to the type int as described earlier 
in Section 6.1.1 by matching the application site type instance recorded in the root type-map 
with the full type signature of the h3. This completely instantiates h3's type-map yielding the 
exact types of its function parameter z and its other local identifiers. 

As noted in Section 6.1.2rthe type reconstruction schemes described earlier [App89rGol9ir 
GG92] would fail to reconstruct the type of x in the body of h3. The reason is that these schemes 
only use the type information derived from the current stack of activation frames. When higher- 
order closures such as g3 are invoked and type reconstructed!" the function producing itrf3r 
may not be present on the current stack. Any clues that f 3 might have provided regarding the 
types of free identifiers of g3 are therefore not accessible during reconstruction. 

6.5 Compiler Optimizations 

It might appear that our compilation scheme incurs a lot of run-time overhead due to additional 
parameters and encoding and decoding of types but our experience has been that realistic 
programs contain very few (if any) non-type-conserving functionsFso the overhead of generating 
and propagating their type-hints is reasonably small. Although our current performance is 
adequaterwe hope to be able to improve our scheme through several compiler optimizations 
that are discussed below. 

6.5.1 Rearranging the Hint Parameters 

Currentlyr additional type-hint parameters required by a function definition are placed just 
in front of the regular parameter that would otherwise lose that information according to 
Definition 6.2. This is not strictly necessary. We can place a hint parameter either before 
or after the first regular parameter whose type contains the non-conserved type-variable that 
is encoded by the hint parameter. This rearrangement does not affect program translation 
(Section 6.3.3) since the regular parameter and the associated type-hint parameter are still 
supplied together at the same application site. Of courserthe hint parameters corresponding 
to the non-conserved type-variables in the types of the free identifiers of a function must still 
be place right up front. 

The benefit of such rearrangement is that it may sometimes reduce the propagation overhead 
of type-hints by removing some extra parameters altogether via ^-reduction. For examplerthe 
following alternate translation for Example 6.6 is also valid (compare with Example 6.8): 

Example 6.9: 

def f3 x = 

{ def h3 h3_hint_l z = if length x == 1 

then z:nil 
else z:z:nil; 
in h3 } ; 
g3 = if ... 

then f3 (l:nil) (tc "int" nil) 
else f3 (truernil) (tc "bool" nil); 



128 



Hererthe parameter f3_hint_l of f3 was pushed after its parameter x which made this 
^-reduction possible. 

6.5.2 Arity Analysis 

Definition 6.2 conservatively prescribes that the only type-variables that are conserved in a 
multiple-arity function are those present in its final application type because the function could 
be curried over its initial arguments. This definition can be specialized to include the types 
of all the arguments present at an application siterif that site is guaranteed to be accessible 
through the dynamic activation tree. That isFall arguments at an application site that leads 
to a full application may be treated as being conserved at that application site. For example: 

Example 6.10: 

def f 11 12 13 = (length ll)+(length 12)+(length 13); 

g = f (l:2:nil); 

. ..(g (truernil) ("foo" :nil)) . . . 

Definition 6.2 predicts that the types of lists 11 and 12 are not conserved by the definition 
f . But at the final application site for closure gl"12 is also available immediately which implies 
that its type is conserved at this application site. 

In generairat compile-timerit may not be possible to recognize the application of an arbi- 
trary function closure as its final application site. But it is easy to recognize the special case of 
first-order (or full-arity) application of a function where all its arguments are supplied at once. 
In such casesrthe types of all the actual arguments and the type- variables present in them may 
be instantiated from its application siter although the function may still require type-hints in 
order to reconstruct the types of its free identifiers. 

In our current schemer it is not possible to optimize away the type-hints prescribed by 
Definition 6.2 for a function at its first-order application sites because the function definition 
may still require type-hint parameters due to higher-order application sites present elsewhere. 
This is simply a consequence of our choice to provide type-hints by adding extra parameters 
to a function's definition. AlternativelyFwe can either generate a specialized first-order version 
of the function that does not carry any type-hints and use it wherever possibler or choose 
another mechanism for hint propagation that is transparent to the usual parameter passing 
conventions. Thenr we would be able to tailor the type-hints according to the information 
available at a particular call site without affecting the function's definition. 

6.5.3 Escape Analysis 

Together with first-order call site informationrif the types of the free identifiers of a function 
are also known to be reconstructible via the currently visible activation treerthen no extra 
types-hints are necessary at alireven if the function was determined to be non-type-conserving 
by Definition 6.2. Escape analysis of function closures offers this information. Specificallyr 
if analysis shows that a function closure does not escape from the lexical scope where it was 
definedrthen the correct instantiations of its free identifiers would still be available from the 
activation frame of this ancestor in the activation tree. In that caseFwe do not need to set 
up extra type-hints to reconstruct these instantiations within the given function's activation 
frame. 

It is possible to use the region-based closure typing system described in Part I of this thesis 
to undertake such escape analysis for internal function closures. We simply need to associate a 



129 



fresh region variable with each internal function definition that statically tracks the movement 
of its closure data-structure. Presence of this region variable in the type environment of the 
enclosing control blockTor in the type of the returned value from that block would indicate that 
the function closure is escaping the scope of its definition. 

6.5.4 Tail Calls 

Our current scheme does not deal with tail calls where the usual caller-callee relationship is 
violated. A tail call removes the caller's activation frame from the activation tree and connects 
the callee to the parent of the caller directly. In such a situationrthe application site information 
for the callee is lost. Consider the following example: 

Example 6.11: 

def f x = 1 + length x; 
def g n = if n == 1 

then f (l:2:nil) 

else f (truernil); 



Without tail callsrthe type of x in an activation of f can be determined by locating its call 
site within the then or the else branch of the conditional inside g. Butrif these applications 
were compiled as tail callsrthen the f 's activation will get directly connected to the top-level 
and the call site information will be lost. 

It is easy to extend our scheme to deal with this situation. We simply modify Definition 6.2 
to reflect the fact that no call site information is available for f and therefore explicit type-hints 
may be needed for all of its free type-variables. This leads to the following translation: 

Example 6.12: 

def f f_hint_l x = 1 + length x; 
def g n = if n == 1 

then f (tc "int" nil) (l:2:nil) 

else f (tc "bool" nil) (truernil); 



Nowrall the type information is available from within the activation of f . Of courserthis 
scheme is not optimal because it ignores the call site information even when it is available using 
regular calling conventions. In order to incorporate that flexibilityFwe need to generate several 
application site specific versions of the function definition as discussed earlier. 

6.5.5 Type Specialization 

Our current scheme generates and interprets encoded type information in order to reconstruct 
the types of all local and free identifiers of a function. We do not take any position on what 
to do with these types. This strategy is adequate and desirable for a source debugger because 
it may wish to manipulate an object in many different ways. Once the type of the object is 
reconstructedrit can be interpreted to traverse and manipulate the object in any desired way. 
It is possible to apply the principle of type conservation (Definition 6.2) and the program 
analysis and translation strategy (Section 6.3) in any specific context to allow complete analysis 
of run-time objects in that context. For instanceOn order to display objects in the Id debuggerl" 
we could compile a parameterized display routine for every datatype occurring in the program. 



130 



Run-time type reconstruction would be used to compose these display routines appropriately 
and then display the given object directly by passing it to its display routine without any type 
interpretation. 

Similarlyrit is possible to generate specialized garbage collection routine(s) for every func- 
tion instead of its type-mapFparameterized by GC-routines that correspond to the free type- 
variables in its type-map. ThenFwe can generate and propagate closures of GC-routines instead 
of type-hints as described in Section 6.3. These parameter routines would be picked up auto- 
matically by the GC-routine(s) of the function from its activation frame at the time of garbage 
collection. This scheme would operate in the same way as the tagless garbage collection mecha- 
nism proposed by Goldberg [Gol91] where function-specific and site-specific garbage collection 
routines are generated that understand the structure and the liveness properties of the local 
identifiers of a function. MoreoverFno additional hash-tables would be necessary in order to 
keep track of partially traversed polymorphic shared objects as shown in [GG92] because com- 
plete type reconstruction ensures that the entire traversal of a shared object can be completed 
the very first time it is encountered. 

6.6 Implementation Status 

The type reconstruction scheme described in this chapter has been implemented in two different 
applications within the Id programming environment. We briefly discuss these implementations 
below. 

6.6.1 Type Reconstruction in a Polymorphic Source Debugger 

The need to solve the problem of type reconstruction initially arose while attempting to display 
polymorphic object within a source-level debugger for Id. A preliminary version of the type 
reconstruction scheme described in this chapter was implemented during the fall of 1992 in 
the context of the Id source debugger [Car93] for the Monsoon dataflow architecture and was 
reported in [AC93]. 

The Id compiler [Tra86] was modified to perform the type analysis and hint generation for 
every function within the user program as shown in Section 6.3. A simple Id datatype encoding 
was used for type-hints as shown in Section 6.3.3. The compiler also generated the type-map 
and the hint-map for every function. In order to reduce the book-keeping within the debugged 
the types of temporaryrinternal identifiers were dropped from the type-map of a function; only 
source-leveir user-defined identifiers were kept together with their position in the function's 
activation frame. 

The Id debugger [Car93] was written in Fisp and executed on the host processor in the 
front. It allowed a user to stop the Id program executing on the Monsoon processor in the back 
when certain pre-specified events were triggered. The user could then traverse the current tree 
of activation frames within the Monsoon memory and request function arguments and local 
identifiers to be displayed along with any heap objects that they pointed to. Objects within 
the Id run-time system did not carry any type-tagsrthereforer complete type reconstruction 
was needed in order to decipher the run-time object structure. The debugger reconstructed the 
object types one frame at a time using the run-time type-hints and the type-map and the hint- 
map information provided by the compiler. These types were then interpreted to traverse and 
display the contents of the requested identifiers properly. Objects hidden inside higher-order 
function closures were not displayedralthough such objects could be displayed once the closure 
was applied and gave rise to an activation frame. 



131 



The entire Id programming environment called "Id- World" containing an editor-based in- 
cremental Id compilerra simulator for the Monsoon architectureFand the Id source debugger 
with complete polymorphic run-time type reconstruction was successfully demonstrated during 
the ACM Conference on Functional Programming Fanguages and Computer Architecture held 
in CopenhagenrDenmarkFin June 1993. 

In [AC93]rwe presented a preliminary compilation and type reconstruction scheme which 
omitted some of the formal details. The complete compilation scheme and the type reconstruc- 
tion algorithm now appears in Chapter 7 along with a proof of its correctness. 

6.6.2 Type Reconstruction for Tagless Garbage Collection 

During the fall of 1993rfull support for run-time type reconstruction was integrated into the 
Id compiler for the *T multi-threaded architecture and its run-time system [CCF + 93] for the 
purpose of performing tagless garbage collection. Naturallyrthis required complete type recon- 
struction for every slot of every function activation frame and all the heap objects reachable 
from them including higher-order function closures. 

We conducted a feasibility study involving the design and implementation of a simple "mark- 
and-sweep" garbage collector for the *T architecture based on the run-time type reconstruction 
mechanism. We compared the performance of this scheme against a conservative garbage col- 
lector and a compiler-directed explicit allocation/deallocation schemeFall implemented within 
the same framework. The results of this study were first reported in [AFH94] and are presented 
here in Chapter 8. The study showed that tagless garbage collection based on type reconstruc- 
tion was not only feasible but also beneficial for scientific programs with large scalar arrays. 
The study also indicated that the type reconstruction cost was a small fraction of the overall 
garbage collection cost. Complete details appear in Chapter 8. 



132 



Chapter 7 

Formal Framework for Run-time 
Type Reconstruction 



In this chapteiTwe formalize the reconstruction strategy outlined in the last chapter. Section 7.1 
presents the complete grammar for our intermediate language Kernel Id. In Section 7.2rwe 
describe a compilation scheme that analyzes the source program to identify the additional type 
information necessary for complete type reconstruction and then transforms the program to 
propagate this information at run-time. Section 7.3 presents the run-time type reconstruction 
algorithm and discusses its complexity. Finallyrin Section 7.4 we show the correctness of our 
algorithm. 

7.1 The Kernel Id Intermediate Language 

Our description of type reconstruction is based on the Kernel Id intermediate language Kernel 
Id as shown in Figure 7.1. This language supports a rich set of datatypes including typi- 
cal scalar basetypesrgeneral algebraic (sum-of-products) datatypesFra-dimensional arraysFand 
curried functions. Records and tuples are a special case of algebraic datatypes with a single 
product disjunct. We also assume a rich set of primitive functions for basetypes and array 
construction/selection/modificationFas well as standard predefined algebraic datatypes such as 
list and bool. 

Kernel Id allows multi-arity function definitions and general algebraic type declarations. 
Every sub-expression in this language is given an explicit name that permits accurate repre- 
sentation of data-sharing. In particularFwe assume that every A-expression has an identifier 
name associated with itri.e.rA-expressions are only allowed to occur on the right hand side 
of a binding. Simply nested let-bindings are generalized to a recursive letrec-style block of 
bindings. SimilarlyFa 2- way conditional operator (if. . .then. . .else. . .) is generalized to an 
m-way Case dispatch operator. The semantics of this language has been given directly in terms 
of graph rewriting rules as shown in [AA91TAA94]. Althoughrwe will use the operational 
machinery described in Chapter 3 while showing the correctness of our type reconstruction 
algorithm. 

Kernel Id is a more realistic abstraction of actual intermediate form used in the Id com- 
piler [AA9irTra86] than the tiny expression language used in Chapter 3. The Id source lan- 
guage supports special syntactic constructs such as list and array comprehensionsr complex 
pattern matchingr and nested function and type declarations [Nik91]. During compilationr 
the Id source program is translated into a Kernel Id program using standard front-end analy- 



133 



f,x,y,z... 

SE 

E 

ppn 

Case m _T 

°m 

Constant 
SE 

E 

Block 
Binding 
Declaration 
Type-Decl 



Expressions 

G Constant 

G Identifier 

G Simple Expression 

G Expression 

G Primitive Fn. with n arguments 

G m-way Case Dispatch for type T 

G ra-th Constructor Identifier with k, n 



arguments 



Case m _T SE {E x 
Block 



Integer \ Float 

Identifier | Constant 

SE\ PF n (SE 1 ,...,SE n 

| Xx\ ■ ■ -x n . E | SE\ SE2 

{ [Binding;]* in SE } 

Identifier = E 

Binding | Type-Decl 

type T oti ■ ■ ■ a n = C\ t u ■ ■ ■ T lkl 



■ ■ E„ 



^m T~ml ' 7~5 



mk n 



Program ::= [Declaration;]*^ 

Figure 7.1: The Kernel Id Intermediate Language. 



ses and transformations such as comprehension-desugaringrscope-analysisrtype-checkingFand 
pattern-matching compilation [AA9irGup90rTra86]. These transformations result in a Kernel 
Id program where every sub-expression has a unique name and a well-defined Hindley/Milner 
typerso that all internal type declarations can be lifted to the top-level. Althoughrwe use 
source Id syntax in our examplesrtheir correspondence to a Kernel Id program should be easy 
to follow. 

7.2 Compiler Support for Type Reconstruction 

7.2.1 A Type System for Computing Type-hints 

Figure 7.2 shows a systematic way of performing type-hint analysis and propagation discussed 
informally in Chapter 6 within the context of the Kernel Id intermediate language. We have 
modified the usual Hindley/Milner typing rules [Mil78] to compute and propagate additional 
type-hint information. In this systemrthe conventional Hindley/Milner type of a function 
closure (ti — > t%) is prefixed with the set of type- variables p that are not conserved in its 
immediately previous partial application. 1 

Definition 6.2 identifies the exact set of non-conserved type-variables at each argument 
position of a multi-arityr user-defined function. Type-conserving positions are assigned the 
empty set (f>. Each type-variable t G p may be taken to represent the overloading predicate 
{tree? t) as shown in Section 6.3.2. Type schemes a generalize and instantiate such augmented 



Although, p is defined here to be a set, the ordering of the type-variables within the set would become 
important when we translate their type instantiations into actual type-hints parameters. 



134 



const: 



ident: 



app: 



abs: 



Types 

a, (3 G Type- Variable 

T n G Type-Constructor with n type arguments 

t G Type 

p G Type-hint Set = PowER-SET(Type) 

o G Type Scheme 

TE G Type Environment = Identifier — > Type Scheme 

t ::= a \ int \ float \ (nd_array r) 

| (T n T 1 ---T n )\T 1 ^T 2 \p.T 

a ::= Vai ■ ■ ■ a n .T 

typeof(c) > <j).T 
TE\-c:t,<I> 



typeof(PF n ) > 0.(n -► • • -T n -► r n+ i), 
PRIMAPP: T£^ h 57?; : r;, p; 1 < i < n 



TE h PF n (SE U ..., SE n ) : r n+1 , Ui<,-<n />* 



TE\-SE:(TT 1 ---T n ), Po 
case: TE \- Ei : r,pi 1 < i < m 



TE h Case m _T SE (E t ■■■E m ):T, \Jo<i< m Pi 

TE(x) > p.r 
TE \- x : r, p 

TE h SE t : (t' -» /o.r), joi Tff h SE 2 : r', p 2 

T£h5£ 1 5£ 2 :r, (pUp!Up 2 ) 

TE+ {xi ^T U ...,x n ^T n }^r E : T n+1 ,p 
Let TM be the type-map of Xx\ ■ ■ -x n .E 
po = f(TM)\f(n4-r n 4r B+1 )) 
Pi = T(Ti) \ T(t 1+1 -> • • -r n -> r n+ i) 1 < i < n 

p' = T(p)\(p U---Up n -i) 

TE h \x!---x n .E : p -(i"i -> Pi-(t" 2 -> ■ ■ ■ Pn-i-(T n -> p'.r n+ i) •••)),<£ 

T£ + {a;,- h-» r 8 } h £,- : r 8 , p 8 



T£ 6l = TE + {x % ^ Gen(TE, r,-)} 



i G &i 



T£^ 6l + {a;,- h-» r 8 } h £,- : r 8 , p t 
f {ai; I— t- Gen(TEf >1 . 

TE h \- SE :t ,po 



block: T£ &2 = TE bl + {a;,- ^ Gen(TE bl , r,-)} ' ' G &2 



T£ h { xi = £^i; • • • ; x n = E n in SE} : r , Uo<i<n Pi 
Figure 7.2: Rules for computing Non-Conserved Type Information for Kernel Id Programs. 



135 



types as usual. We derive typing judgments of the following form: 

TE h E : r, p 

Herer EE is a type environment mapping identifiers to type schemesrY is the type assigned to 
the expression ETand p is the set of type-hint instantiations within E that are needed during 
its type reconstruction. These type-hints are required when a non-type-conserving function 
is referenced or is applied inside the expression E. All such type instantiations are collected 
and propagated up to the nearest enclosing function definition where they become part of that 
function's type-hint requirements. 

Looking at Figure 7.2rpredefined constants and primitive functions (const and PRIMAPP 
rule) do not give rise to any non-conserved type-variables since they always execute within 
the current activation frame and never create any partial applications. The CASE rule is 
also straightforward. It simply collects the type-hint instantiations inductively from its sub- 
expressions while ensuring that all branches have the same type. 

The IDENT rule instantiates the augmented type of a user-defined functionr exposing the 
exact instantiations of its non-conserved type- variables that need to be provided at that point. 
The augmented type instantiation is immediately split into the actual type r and the set of type- 
hint instantiations p. Note that the size of the set p remains fixed during its type instantiation. 
In particularran empty set of type- variables (f> can never be instantiated to a non-empty set of 
type instantiations and vice-versa. 2 

New type instantiations may also be introduced by the APP ruleFwhere the augmented type 
of the result closure exposes the exact instantiations of the non-conserved type-variables at 
that application site. All such instantiations are collected and propagated to be resolved at the 
nearest enclosing A-expression. 

The ABS rule computes the set of non-conserved type- variables of a A-expression and records 
them within its augmented type so that they may be instantiated later by the IDENT rule or 
the APP rule. The type-hint sets po ■ ■ • p n -\ are computed for each argument position of the 
function as given by Definition 6.2. These sets are placed along the type signature of the 
function just after the argument position where that type information would otherwise be lost. 
Type-variables that are conserved at the various argument positions are excluded from the 
corresponding type-hint sets. The final type-hint set p' computes the additional type-variables 
for which type-hints are required by internal sub-expressions of the A-body. 

The BLOCK rule is a generalization of the usual Hindley/Milner LET rule as applied to the 
more complex syntax of the Kernel Id language. The type generalization operation Gen(EE, r) 
generalizes the augmented type r (which may contain embedded type-hint sets) into a type 
scheme Vcui • • -a n .T. We assume that the bindings in a blockTnumbered 1 . . . nFare partitioned 
into k groups of mutually recursive bindings b\ ■ ■ -b^ (&i + • • • + &jt = ra)Tand these groups 
are topologically sorted such that definitions occur before their uses. Each group of mutually 
recursive bindings is type-checked within a type environment that assigns polymorphic type 
schemes to the identifiers bound in previous groups and monomorphic types to the identifiers 
bound within the same group. This transformation maximizes Hindley/Milner polymorphism 
for an unordered sequence of bindings [Gup90rHWe90]. 



This property ensures that each type-variable instantiation may be treated as an independent parameter 
to be inserted at that site during translation, although it may introduce some subtle typing discrepancies as 
discussed in Section 7.2.4. 



136 



7.2.2 Type Inference 

The type system shown in Figure 7.2 may be directly used as a basis for automatically inferring 
augmented Hindley/Milner types along the lines of the standard Hindley/Milner type inference 
algorithm [Mil78]. The type-hint sets are considered to be ordered and of fixed sizeFand may 
be treated as part of the type signature of a function. In particular note that a non-empty 
set can never be unified with an empty set. Thereforerthe usual structural term unification 
algorithm [Rob65] would suffice for matching types. 3 

The type inference algorithm would be similar to the infer algorithm cited in Section 3.4 with 
minor modifications. We need to do some book-keeping in order to collect and propagate type- 
hint instantiations from within expressions and process them at the enclosing function definition. 
The modified type inference algorithm would return the possibly augmented Hindley/Milner 
type (r) of an expression along with the set of type-hint instantiations (p) gathered from within 
the expression. Type generalizationrinstantiationFand substitution would now take place on 
augmented types. In case of a user-defined functionrthe algorithm would also compute its 
type-map as given by Definition 6.1 and the type-hint sets po ■ ■ ■ p n _i,p' as shown in the ABS 
rule. These sets would then be attached to their appropriate argument positions within the 
type signature of the function. 

7.2.3 Program Translation and Type-Hint Generation 

The final step in the compilation process is to add explicit parameters to functions with non- 
trivial augmented types and to provide appropriate type-hints at their application sites. 

Generation of type-hints uses a run-time encoding and decoding scheme as shown in Fig- 
ure 7.3. The encoding is performed under a Translation Environment T that maps free type- 
variables of a given type r to value-domain identifiers encoding those type-variables. The 
encoding scheme TEnc[] produces a Kernel Id expression which when executed at run-time 
generates the type-hint encoding for the given type scheme; it does not generate the encoded 
type scheme itself. This is so because the encoding scheme is used as part of the source-to- 
source compilation process that translates a Kernel Id program into another Kernel Id program 
with explicit type-hint propagation. 

For each type constructor T"Twe denote its run-time encoding by a new constant T n . A 
bound type- variable oi{ in a type scheme Vai ■ ■ -a n .T is encoded as a special constant type- 
constructor T°.. A family of Kernel Id primitive functions pack 71 with arity n are used to pack 
an encoded type constructor and its arguments into a run-time data-structure. 

The decoding scheme TDec[] is used at run-time to convert the encoded type-hints into 
actual type schemes used during run-time type reconstruction. Although this mechanism is 
described as the logical inverse of encoding type schemesrthe actual decoding format depends 
on the data format used within the run-time system for type reconstruction. 4 

The program translation and hint generation scheme TExp[] is shown in Figure 7.4. This 
translation is guided by the typing judgments derived from the typing rules shown in Figure 7.2. 
The translation rules operate under a Translation Environment T that maps free non-conserved 
type-variables of a function definition to its type-hint parameters. 



The careful reader might note that performing structural type matching on the type-hint sets may reject 
some programs that would be considered to be type-correct in the original Hindley/Milner type system without 
such sets. We will discuss this issue in Section 7.2.4. 

In our current implementation discussed in Chapter 8, the data format used for encoding type-hints is the 
same as that used within the run-time system for type reconstruction, therefore no decoding is necessary. 



137 



r G Translation Environment = Type- Variable — > Identifier 

~d G Encoded Type Scheme 

TEnc[] G Type Scheme — > Translation Environment — > Expression 

TDec[] G Encoded Type Scheme — > Type Scheme 

Type Scheme Encoding 
TEnc[a] T = T(a) 

Let z, z\, . . . , z n be new identifiersr 
TEnc[(rr r ..g] T = { z x = (TEnc[n] T); 

z n = (TEnc[7>jr); 
z = pack n+l (T n 1 z ll . . .,z n ); 
in z } 
TEnc[[Vai • • • a n .r] T = TEnc[r] (r + {a t ^ 7%}) l<i<n 

Type Scheme Decoding 

TDecJa] = V«i ■••«„. TDec'Jff] 

where {a\, . . . ,a n } = J(TDec'[?]) 

TDec'Q = a 

TDec'KT", tT, • • .,tJ] = (T n TDec'[TT] • • • TDec'[r^]) 

Figure 7.3: Encoding and Decoding of Type Schemes. 



Most of the translation rules are straightforward. Constants do not require any translation. 
The rules for primitive applicationrCase-expressionFand block recursively translate their sub- 
expressions. 

The translation of a function identifier converts the exact instantiations of its non-conserved 
type-variables into explicit type-hint arguments using the encoding shown in Figure 7.3. Simi- 
larlyrthe translation of a function application inserts appropriate type-hints at that application 
site as directed by the function signature. 

The translation of a A-expression adds explicit hint parameters y\ ■ ■ ■ y m at the appropriate 
position corresponding to each non-conserved type- variable obtained from its typing judgment. 
We also record this mapping as the hint-map of the A-expression and use it to extend the 
translation environment for the body of the given A-expression. 

We assume that the type-map (Definition 6.1) of a A-expression is updated to reflect the 
new type-hint parameters that are added to its type signature and the new local bindings that 
are created within its body during the translation. This change does not affect the set of 
free type- variables of the type-map because encoded type-hints have a fixedr pre-defined non- 
polymorphic typerand the types for all other additional identifier bindings are already present 
within the type-map. 

After this program transformationFall the type information needed to fully instantiate the 
type-map of a function is available at run-time within its function activation framer either 
directly as run-time type-hints or indirectly via instantiations of conserved type-variables in 
its type-map. In the next sectionFwe will show a type reconstruction algorithm that uses this 
information at run-time to reconstruct the complete dynamic state of the machine. 



138 



r G Translation Environment = Type- Variable — > Identifier 
TExp[[] G Expression — > Translation Environment — > Expression 

const: 

TExpJc] T = c 

PRIMAPP: Let z, z\, . . . , z n be new identifiersL 
TExp[PF n 5#i • • • SE n J L = { z x = (TExp^TJ L); • • • 

z n = (TExp|[5^ n ]r); 
z = PF n zi ■■■ z n - 
in z } 
CASE: Let z,zo be new identifiersL 
TExp[Case m _T SE {E x --- E m )} L = { z = (TExp{SEj L); 

z = Case m _T z ( (TExp[5£i] L) • • • 

(TExp^jL)); 
in z } 
IDENT: Given typing judgment TE \- x : r, p where p = {ti ■ ■ -T n }T 

Let 2, 21, . . . , z n be new identifiersL 
TExpH L = { ^(TEncfnTjL); ••• 

z n = (TEncIrjr); 

Z — X Z\ ' ' ' z n , 

in z } 
APP: Given typing judgment TE h SE\ SE2 : t, (pU p\L) p^) where p = \j\ ■ ■ -T n }T 

Let z, z', z", z\, . . . , z n be new identifiersL 
TExp[5£i SE 2 J L = { z' = (TExp[5£i]r); 

2"=(TExp[5*i? 2 ]L); 
^(TEncfnJJL); ••• 
z n = (TEncIrjr); 

Aj Aj Aj ^X O 1 

in z } 
ABS: Given typing judgment TE h Aa>i • • •a; n .£' : p -(i"i -> • • -p^-i.^ -> p'.r n+ i) • • •),<£ 

where (p U • • • U p n _i U //) = {a t ■ ■ ■ a m }T 

Let yi, . . . ,y m be new parameters with hint-map i/M = {a\ 1— > yi, . . . , a m 1— > y m }T 
TExpl\ Xl ---x n .El L = \y po x 1 y Pl ---y Pn _ 1 x n y p ,.(TExplEl(T + HM)) 

BLOCK: Let z be a new identifierL 

TExp^ = £ i; •• •;£„ = £„ in S*£}] L = { ^ = (TExp^] L); • • • 

z n = (TExp[£ n ]r); 
2=(TExp[5*i?]L); 
inz} 

Figure 7.4: Program Translation and Hint Generation Rules. 



139 



7.2.4 Discussion 

Type Mismatch in Curried Functions 

The augmented type system presented in Section 7.2.1 above is a straightforward modification 
of the standard Hindley/Milner type systemr but it has one drawback that it restricts the 
set of type-correct programs to those that are also type-reconstructible. In particularr this 
system may reject some programs that would have been considered type-correct in the usual 
Hindley/Milner type system without any type-hint sets. The following example illustrates this 
point: 

Example 7.1: 

def fl x y = y; '/„ fl :: V* o *i.0.(*o -► {*o}-(*i -► M)) 

def f2 x = '/. f2 :: VVi-<M*o -» <M*i -» <£-*i)) 

{ def h2 y = y; 
in h2 }; 

gl = (if ... then f 1 else f 2) ; °/, Static Type Error! 

g2 = if ... then fl 1 (int) else f2 1; '/, No Static Type Error. 

The functions f 1 and f2 have the same type signature in the usual Hindley/Milner type 
system but they have different type signatures in the current system because f 1 requires a type- 
hint for its first argumentrwhile f 2 does not. This is because the type of the first argument 
of the function f 1 is not conserved according to Definition 6.2rwhile both f2 and the internal 
function h2 are considered to be type-conserving. This type mismatch shows up in the binding 
for gl which is flagged as a type-error in our augmented type system. Howeverrthe binding for 
g2 may be typed without any problem because the type-hint required by the function f 1 has 
been already inserted. 

The above example shows that our type system makes a subtle distinction between implicitly 
curried multi-arity functions such as f 1 and their explicitly curried counterparts such as f 2. To 
be preciserthis difference shows up only in non-type-conserving functions as shown in the above 
example; type-conserving functions would always have empty type-hint sets. This difference 
exposes an important run-time characteristic of such functions: the number of applications 
after which a function closure expands into an activation framer which is controlled by their 
syntactic arity and not by their semantic typing. 

In a wayrthis difference is to be expected because f 1 and f2 carry different objects within 
the closures resulting from their first application and hence require different amount of type 
reconstruction information. Since f 1 is implicitly curriedrit simply records its first argument 
within its closure. This forces the type conservation mechanism (Definition 6.2) to insert 
additional type information in order to ensure subsequent type reconstruction of this argument 
even if it was never used within the function's body. On the other handrf 2 produces an entirely 
new closure h2 on its first application that is completely independent of its first argument. Sol" 
there is no need to preserve its type within the returned closure. 

It should be noted that the type mismatch between f 1 and f2 is generated not merely 
because we have chosen to represent the type-hint information explicitly within the type sig- 
natures of these functions. This type mismatch is actually a consequence of the underlying 
compilation mechanism that treats additional type-hints just like any other function parame- 
ters. In particularly would not be possible to correctly compile the binding for gl even if the 
type-hint analysis was done after the type-checking phase. This is because under our current 
compilation strategyfonly f 1 requires a type-hint which is determined only after it is applied 



140 



to a particular argument. 

Multiple Type Signatures 

Another interesting difference between this type system and the usual Hindley/Milner type 
system shows up with higher-order functions that take other functions as arguments. Consider 
the map function shown earlier in Example 6.1 which is reproduced below: 

Example 7.2: 

def map f nil = nil °/, map :: \/toti.(f>.(to — > <j).t\) — > (j). ((list to) — > <j).(listt\)) 

I map f (y:ys) = % map :: yt ti.(f).(t — > {t }.ti) — > (f).((list t ) — > {t }.(list ti)) 

(f y) : (map f ys) ; 

def enlist x = x:nil; 

gl = map enlist (l:nil); °/, No type-hint needed. 

def ignore x y = y; 

g2 = map ignore (l:nil) (int) ; 7, Type-hint needed internally by ignore. 

Two possible types for the map function are shown. The first type assumes that the input 
function f is type-conserving and therefore would not need any type-hints when it is applied 
within the body of map. This permits type-conserving functions such as enlist to be passes 
to map as usual. The second type signature assumes that the incoming function would not be 
type-conserving and would need a type-hint at its application within the body of map. This 
type-hint propagates up to the definition of the map function and shows up in its type signature 
after the second argument. This allows non-type-conserving functions such as ignore to be 
passed as arguments to map. 

It may be a little disconcerting to note that the map function no longer has a single type. 
On the other handrthe two versions of map are truly different functions and must be compiled 
as such — one that propagates type-hints and the other that does not. One can think of the 
original Hindley/Milner type signature of the map function as being overloaded with the various 
intended versions. The compiler may selectively produce these specialized versions according 
to the type of the arguments supplied to map. 

Alternate Compilation Scheme 

Both the problems presented above may be fixed by making the type-hint compilation more 
uniform and transparent to the standard parameter passing mechanism. In this sectionFwe 
briefly examine one such compilation scheme. 

Instead of inserting type-hints required at a given argument position as additional parame- 
tersrwe may put them in a separate record and pass a single pointer to that record to a fixed 
entry point identified by that argument position. Effectivelyrthis adds one additional parame- 
ter for every argument position whether or not any type-hints are needed at that position. The 
advantage of this compilation scheme is that it completely dissociates propagation of type-hints 
from regular parameter passingFalthough it takes additional frame space and time overhead in 
allocating type-hint records. In this schemerEmpty type-hint records need not be propagated 
at alirwhile non-empty type-hint records may be passed to even type-conserving functions that 
do not require this information. The latter property fixes the problem of compiling gl shown 
in Example 7.1. NowFa type-hint record would be created for each application site of gl which 
would be used by f 1 during type reconstruction but would be simply ignored by f2. 



141 



This scheme also makes the compilation of higher-order functions such as the map function 
of Example 7.2 more uniform. Now the map function may be compiled to always propagate the 
type-hint record it receives from its first argument position to its internal application site. If 
no actual type-hint record is supplied from outside then this mechanism essentially propagates 
an empty type-hint record to the internal application site. HoweverFspecialized version of map 
that do not pay this overhead may still be compiled as an optimization. 

7.3 Run-time Type Reconstruction 

7.3.1 Type Reconstruction Requirements 

Before we describe our type reconstruction algorithmFwe summarize the requirements for full 
type reconstruction as discussed in previous sections. We use both compile-time and run-time 
information. 

1. The compile-time information consists of the type-map (Definition 6.1)Tthe hint-map 
(Definition 6.3) and the arity of each function that is stored in the symbol table entry for 
that function. 

2. FurthermoreDevery function in the program must be transformed as shown in Section 7.2 
to propagate explicit type-hints for its non-conserved type-variables. 

3. The run-time information consists of the global dynamic environment and the root frame 
of the activation tree that remain live and are assumed to be accessible throughout the 
computation (Section 6.2.1). The activation tree hangs from the root activation frame 
and is modified dynamicallyFas the program executesrby the procedure linkage code. 

4. At program invocation timer complete type information is available for the user query 
expression and the root activation frame (Section 6.2.1). Thereforerthe root frame should 
already be marked as reconstructed. 

5. Given any activation frameFwe should be able to identify the function associated with 
itrits parent activation frameFand the application site in the parent frame that created 
this activation frame. 5 Typicallyrthe conventional return address information within the 
callee is sufficient for this purpose. 

6. Proper decoding mechanism should exist for types and type schemes encoded as type- 
hints (Figure 7.3). A run-time mechanism for type unification is also requiredralthough 
it can be simplified considerably since static type-checking guarantees that unifications 
performed within the reconstruction algorithm cannot fail. 

7.3.2 The Reconstruction Algorithm 

Figure 7.5 shows the pseudo-code for the reconstruction algorithm Reconstruct-Frame 
which is invoked at run-time to reconstruct the types of all variables in a given activation 
frame. Reconstruct-Frame takes an activation frame as a parameter and returns a fully 
instantiated type-map for that frame. For ease of presentationrthe algorithm makes use of 
several auxiliary functions which we will explain as we go along. 



We ignore the issue of "tail calls" whose compilation was discussed in Section 6.5. 



142 



Reconstruct-Frame (activation-frame) 
t> Return if already reconstructed. 

1 if Frame-Reconstructed (activation-frame) 

2 thenreturn Frame-Type-Map (activation-frame) 
t> Otherwise, start reconstruction. 

3 activation-fn <— AcTlVATlON-FN(activation-frame) 

t> Copy the function's type-map. 

4 type-map <— Type-Map (activation-fn) 

5 {«!,...,«„} <r- ^(type-map) 

6 S copy <— {on I— t- /3 8 -} where (3\, . . . , (3 n are new. 

t> Process the type-hints. 

7 hint-map <— Hint-Map (activation-fn) 

8 Skint ^~~ { forall (a \— > x) in hint-map 

9 a <— FETCH-ARGUMENT(aT activation-frame) 

10 a<- TDec[ff] 

11 collect (S copy a \— > a) } 

t> We are done if the type-map is fully instantiated. 

12 if T(ShiniS C o Vy (type-map)) = <f> 

13 thenFRAME-TYPE-MAP (activation-frame) <— SkintS C o-py (type-map) 

t> Otherwise, obtain call site information from the parent. 

14 else { parent-activation-frame <— Parent- AcTiVATiON-FRAME(activation-frame) 

15 parent-type-map <— RECONSTRUCT-FRAME(parent-activation-frame) 

16 r use <— Use-Type (activation-framerparent-type-map) 

17 T def ^ DEF-TYPE(activation-fnrS' CO j,j / (type-map)) 

18 if FuLL-App(activation-framerparent-type-map) 

19 then S de f-use <- UNIFY^ef, r Mse ) 

20 else{ A; <— Arity (activation-fn) 

21 Ti -> • • • T k -> T fc+ i <- Trfe/ 

22 S^ef-use <- UNIFY(r fc -> T fc+ i, T Mse ) } 

23 FRAME-TYPE-MAP(activation-frame) <— S , ( f e /-Mse5'/tm<5'coj)|/(type-map) } 

24 Frame-Reconstructed (activation-frame) <— true 

25 return Frame-Type-Map (activation-frame) 

Figure 7.5: The Type Reconstruction Algorithm. 



143 



Reconstruct-Frame is divided into several sections. We begin at Line 1 by checking if 
the given activation frame has already been reconstructed. If sorthe previously recorded frame 
type-map is returned immediately. OtherwiseFwe initiate the reconstruction process. 

The first sectionrLines 3-6rinitializes the data-structures used in the reconstruction. We 
extract the name of the current activation function from the given frame using the selector 
function Activation-Fn and instantiate its type-map with fresh type-variables by building a 
type substitution S copy for its free type-variables. This is necessary so that types from multiple 
activations of the same polymorphic function do not inadvertently interfere with each other. 

The next sectionrLines 7-11F builds a type substitution Skint f° r a U the non-conserved 
type-variables of the function as prescribed by its hint-map. The type-hint corresponding to 
each hint parameter present in the hint-map is fetched from the activation frame and then 
decoded according to Figure 7.3. The resulting type schemes are the run-time instantiations of 
the non-conserved type-variables present in the hint-map. 

Following thisFLine 12 checks to see if all free type- variables of the type-map have been 
instantiated to either ground or polymorphic types. If soFthe reconstruction is complete and 
the fully instantiated type-map is recorded at Line 13. The Frame-Reconstructed flag is 
set at Line 24 and the reconstructed type-map is returned at Line 25. 

If the test fails at Line 12FLines 14-22 obtain the remaining information from the activation 
tree as follows. FirstFthe type-map of the parent of the current activation is reconstructed by 
calling Reconstruct-Frame recursively with the parent's activation frame. Using this type- 
map and the current activationrthe auxiliary function Use-Type obtains the reconstructed 
type-instance of the call site responsible for invoking the current function (see item 5 of Defini- 
tion 6.1). This type-instanceFr Mse ris then unified with the defined type of the current functionr 
Tdef that is available within the current type-map. This unification fully instantiates all the 
remaining type-variables in the current type-map which is recorded at Line 23 and is returned 
at Line 25 as before. 

The matching of r^/ to T use is slightly complicated by the fact that the current activation 
could either be a full application of a A;-arity function to all its arguments or it could simply 
be the final (k-th) application of a curried function closure that has already accumulated k — 1 
arguments in previous partial applications. The recorded application site type instance T use 
would be different in these two cases and therefore it must be properly aligned before matching 
with the function's full type signature r^/. 6 This application information is also recorded within 
the parent's type-map and is obtained at Line 18 using the auxiliary function Full-App. In 
case of a full applicationr T use is directly unified with r^/ recorded in the current function's 
type-map at Line 19. In case of a curried applicationr T use must be unified with just the final 
application type r^ — > t^+i of the defined type r^/ as shown at Line 22. 

7.3.3 Reconstruction Complexity 

A few observations about the reconstruction algorithm are worth pointing out. Firstrthe entire 
activation frame of a function is reconstructed at once. This is possible because the types of all 
the objects present in an activation frame share the same set of free type-variables which are 
precisely captured and instantiated using its type-map. This obviates the need to traverse the 
activation tree multiple times in order to reconstruct the types of various identifiers belonging 
to the same frame. 

Secondrwe cache the reconstructed type-maps of all activation frames for future references 
by their child frames. ThereforeFno activation frame may need to be reconstructed more than 



3 In our earlier paper [AC93], this operation was abstracted into the auxiliary function UNIFY- ALIGNED. 



144 



once. Furthermorer since the root frame is already marked as reconstructed at the start of 
the programLthe algorithm is guaranteed to terminate properly as it recursively climbs the 
activation tree at Line 15. 

Finallyrthe algorithm climbs the activation tree from the current activation frame only as far 
up as necessary. The climbing process terminates at the first ancestor frame that has already 
been reconstructed!" or earlier if sufficient information is available via the type-hints. This 
avoids traversing the activation tree from the root activation frame to all its leaves as suggested 
in [Gol91] which would involve reconstructing all the activation frames within the dynamic 
activation tree. Our algorithm pays only incremental cost for each request for reconstructionr 
which is a very useful feature for interactive applications such as a source debugger. 

The cost of the algorithm Reconstruct-Frame shown in Figure 7.5 depends on the 
following factors: 

1. The number of ancestor frames reconstructed due to recursive calls to the algorithm 
Reconstruct-Frame at Line 15. 

2. The cost of decoding the type-hints at Lines 7-11. 

3. The cost of unification at Line 19 or Line 22. 

The maximum number of ancestor frames reconstructed in a given call to the algorithm 
Reconstruct-Frame is bounded by the number of frames occurring between the current 
activation frame and the root frame. In a sequential systemLthis is all the frames sitting on 
the stack. In a parallel systemLthis is the number of frames on any path from a leaf to the 
root in the dynamic activation tree which is only the depth of the dynamic activation tree and 
not its overall size. Of courseLsince all reconstructed type-maps are cachedLthe overall cost of 
reconstructing every frame within the dynamic activation tree is still linear in the total number 
of activation framesLassuming a unit cost for type unification and type-hint decoding. 

The cost of decoding the type-hints depends on the number of non-conserved type- variables 
in the type-map and the size of their run-time type instantiations. SimilarlyLthe cost of unifi- 
cation is proportional to the size of the function's instantiated type. Although it is possible to 
write functions whose Hindley/Milner types are exponentially large compared to the size of the 
function itself [Mai90]Lsuch cases are rare. TypicallyL functions possess small type signatures 
that can be efficiently manipulated using graphical representations. Non-conserving functions 
are rare as well and run-time type instantiations of non-conserving type- variables are also small. 

The interesting observation here is that the cost of reconstructing a type-map for a given 
activation frame does not depend on the number of slots in the activation frame or the total size 
of the type-map itselfLbut only on the size of the type signature of the corresponding activation 
function. This is because we never need to examine or copy the type of every identifier recorded 
in the type-map during its reconstruction. We only instantiate its free type-variables. 7 

7.4 Correctness of the Type Reconstruction Algorithm 

In this sectionLwe will show that the type reconstruction algorithm given in Figure 7.5 is correctL 
i.e. Lit infers the exact type for every object at any time during the execution of a program. 
We will define the notion of the exact type for an object shortlyLbut for the time being it 
may be viewed as the type that would have been attached to the object had we computed and 



An independent program such as a debugger or a garbage collector may ultimately need to examine the 
reconstructed types of every element in the activation frame. That cost is not included in reconstruction. 



145 



propagated source type information all through the execution of the program. In dynamically- 
typed languages such as Lisprthis is exactly how dynamic type-checking is performed. Every 
object is tagged with its type and that information is carried through each computation step. 
The type of every new object (including scalars such as integers and floats) is computed along 
with its value and is attached to the value as its tag. Of courseFcomputing types is a substantial 
overhead during program execution which is why we have chosen to perform dynamic type 
reconstruction instead of dynamic type maintenance. 

The Kernel Id language (Figure 7.1)Tits run-time execution model (Figure 6.2)Tand the 
type reconstruction algorithm (Figure 7.5) are all quite complex. In order to be able to argue 
about the correctness of the algorithmr we make several theoretical simplifications. These 
simplifications allow us to model these concepts cleanly and distill the basic characteristics of 
the reconstruction algorithm. 

7.4.1 Simple Expression Language and its Semantic Model 

As the first stepFwe restrict ourselves to the simple expression language described in Chapter 3. 
This is because we have already made a considerable effort to rigorously define the static and 
dynamic semantics for this language. We already have an operational semantic model for 
this language (Definition 3.12) and we have shown the consistency between the static and the 
dynamic semantics (Theorem 3.16). This consistency is the main tool using which we will show 
the correctness of our type reconstruction algorithm. It may be noted that the problem of 
complete type reconstruction is independent of the issue of parallel or sequential execution. 
ThereforeFrestricting ourselves to a strictrsequential language instead of dealing with the fully 
parallel execution model of Id does not affect the reconstruction algorithm or the issue of its 
correctness. 

It is easy to see the correspondence between the Kernel Id language shown in Figure 7.1 and 
the simple expression language shown in Section 3.1.1. Most Kernel Id expressions have direct 
analogue in the simple expression language. The important simplifications are that mutually 
recursive functions must be combined into a single self-recursive functionr Case-expressions 
must be broken up into a series of conditional expressions and blocks must be converted into 
nested let-bindings of mutually recursive definitions. 

7.4.2 Partial Execution and the Dynamic Activation Tree 

The second step is to model the state of the machine at the moment when type reconstruction 
is requested. In the relational formulation of the dynamic semantics shown in Section 3.1.2r 
an evaluation of a top-level query expression may be described by a logical derivation tree of 
evaluation judgments of the following form: 8 

e \- a/s =>■ v/s' 

The evaluation derivation tree for the top-level query provides a logical proof of how eval- 
uations of sub-expressions contribute towards the final result of the entire program according 
to the dynamic inference rules shown in Figure 3.1. We will now treat this derivation tree of 
evaluation judgment relations as representing the computation itself. The complete derivation 
tree for the top-level query corresponds to the entire program computation. Each judgment 



We assume that the result of the evaluation is not err. This is because we assume that the entire pro- 
gram including the top-level query expression is type-correct. Therefore, by the Soundness Theorem 3.16, the 
evaluation can never run into a run-time type-error. 



146 





eO |- (f3 = ...; g3 = ...; g3 2)/s0 => _/_ Root 


eO 


- (f3 f3_hint_1 x = ...)/sO =>Vclsr f3...>/s0 ^^< 




e0+{f3 -> <clsr f3...>}=e1 |- (g3 = ...; g3 2)/s0 => _/_ 


e1 


- (if ... then ... else ...)/sO =><clsr h3...>/s0 ^x 


e1 


+ - ' "s e1 +{g3 -> <clsr h3...>}=e2 |- (g3 2)/s0 => / 

- false/sO => false/sO \ " , 


e1 


- f3 (tc "bool" nil) (true:nil)/sO => <clsr h3...>/s0 / ~^^^^ 


e1 


- f3/s0 =><clsr f3...>/s0 >'' ***». % e2 |- g3/s0 =><clsr h3...,e3>/s0 >w 


e1 


- (tc "bool" nil)/sO => <tc "bool" nil>/sO \ e2 |- 2/sO => 2/sO J 


e1 


- (true:nil)/sO => <cons true nil>/sO J / 



e0+{f3_hint_1 -> ... , x -> ...}=e3 



f3\ 



(h3 = ...; h3 f3_hint_1)/s0 => <clsr h3...,e3>/s0 



e3+{z -> 2}=e4 1- / h3 

(if length x ==1 then ... else ...)/sO => _/_ 



Figure 7.6: The Evaluation Derivation Tree for Example 7.3. 



in this tree may be considered as a providing a place-holder for the initial store and the final 
result (value and a new store) computed within that judgment. The store is sequentialized 
through the entire tree in a predictable depth-first fashionr while the values propagate from 
the leaves of the tree towards the root — the value of the top-level query being the value of the 
whole computation. Values may also be passed from one branch of the tree to the other via 
the environment. 

The overall process of evaluation may be viewed as a step-wise unfolding of the evaluation 
derivation tree. We start with the top-level query evaluation judgment using the initial dynamic 
environmentrthe initial storeFand an empty place-holder for the result. In order to compute 
the overall resultrthe top-level evaluation judgment unfolds into a set of antecedent judgments 
needed by the dynamic inference rule that is selected according to the immediate structure of 
the query expression. Each such unfolding creates empty place-holders for the results of inter- 
mediate evaluation judgments. On reaching the leavesIValues are created spontaneously using 
CONSlTlDENlTor ABS rulesFand are used to fill the place-holders for the leaf judgments. On 
each successive computation steprthese values fill the place-holders of their parent judgmentsr 
until they reach an inference rule with multiple antecedents such as APPTTUPLEror LET rulesr 
in which case a new sub-tree of evaluation judgments is spawned. 

As an example of this processr consider the computation shown in Example 6.8 which is 
reproduced below: 

Example 7.3: 

def f3 f3_hint_l x = 

{ def h3 h3_hint_l z = if length x == 1 

then z:nil 



147 



else z:z:nil; 
in h3 f3_hint_l }; 
g3 = if ... 

then f3 (tc "int" nil) (l:nil) 
else f3 (tc "bool" nil) (truernil); 
g3 2; 

This program has been translated according to the scheme presented in Section 7.2 with the 
appropriate type-hints added. The evaluation derivation tree for this computation is depicted in 
Figure 7.6. Each incomplete evaluation judgment in the derivation tree is expanded downwards 
into its antecedent judgments according to the dynamic inference rules of Figure 3.1. Not all 
branches of the derivation tree have been expanded yet. An empty place-holder (_) is used to 
represent an unknown value or a store within incomplete or unexplored judgments. In additionr 
we also collapse the sub-trees that have been fully evaluated (shown in light typeface) up to 
the highest completed evaluation judgment. 

Such a partially expanded evaluation derivation tree may be used to model the exact state 
of a computation at any given point in time: 

Definition 7.1 (Partial Execution Tree) A partial execution tree is a structural tree-prefix 9 
of the complete evaluation derivation tree for the top-level query expression with the following 
characteristics: 

1. Each node consists of a possibly incomplete evaluation judgment of the form e h a/s =? 

v/s 1 . 

2. Sub-trees consisting entirely of complete evaluation judgments are collapsed into a leaf 
judgment e h a/s =?- v/s' corresponding to the highest evaluation judgment that has 
received its value. These nodes represent terminated computation. 

3. Internal evaluation judgments e h a/s =?- _/_ that have been expanded but not yet fully 
evaluated contain empty place-holders (_) to receive their values. These nodes represent 
the active machine state. 

4- Unexpanded judgments e h a/_ =? _/_ are also represented by a leaf with empty place- 
holders. These nodes represent the computation to be spawned in the future. 

Note that if we model the store independently as an external data-structure rather than 
threading it sequentially through the judgments!" we can model parallel computation within 
this framework by spawning several branches of the partial execution tree in parallel. The only 
modification needed in the current dynamic semantics to model this situation would be to use 
a least-upper-bound (U) operation on stores that would combine stores from various branches 
of the execution tree into a single store. 

It is useful to draw a correspondence between the actual dynamic activation tree at any given 
time during the execution of a program and its partial execution tree as described above. This 
may be seen by comparing Figure 6.6 that shows the dynamic activation tree for Example 7.3 
with Figure 7.6 that shows its partial execution tree. The following correspondences emerge: 



A sub-tree starting at the root of the original tree with some of its branches clipped is called a tree-prefix of 
the tree. 



148 



1. The root frame of the dynamic activation tree corresponds to the root judgment of the 
partial execution tree which is evaluating the top-level query expression provided by the 
user. 

2. The type of the query expression is completely known at the beginning of the computation 
which corresponds to the fact that the root frame in the dynamic activation tree is always 
marked as reconstructed. 

3. Each activation frame present in the dynamic activation tree corresponds to a subset of 
evaluation judgments within the partial execution tree that belong to the body of the 
applied function and hang from its application evaluation judgment. In other wordsr 
each application evaluation judgment within the partial execution tree may be viewed as 
initiating a new activation frame for the applied function. 

4. Collapsing evaluation sub-trees for completed evaluation judgments corresponds to the 
fact that the activation frames within that branch of the computation have been deallo- 
cated and just the final value is available within the current frame. 

With the above correspondence in mindrthe partial execution tree serves as an accurate 
logical model of the actual dynamic activation tree. 

7.4.3 Type Reconstruction 

Given the definition of the dynamic state of the machine as a partial execution treer type 
reconstruction may be viewed as the process of computing the exact type of each value present 
in the partial execution tree at any given time. Using the formal machinery at handr this 
corresponds to generating a type derivation tree using the static semantics inference rules 
shown in Figure 3.2rthat parallels the structure of the given partial execution tree. This is 
captured in the following definition: 

Definition 7.2 (Type Reconstruction) Type reconstruction of a given partial execution tree 
is defined to be a type derivation tree with the same structure as the partial execution tree with 
the following characteristics: 

1. For each evaluation judgment in the partial execution tree of the form e h a/s =>■ v/s', 
where s, v, and s' may be empty place-holders, the type derivation tree has a corresponding 
valid elaboration judgment of the form E \- a : r. Furthermore, the type t is the most 
general type satisfying this elaboration. 

2. For each completed evaluation judgment of the form e h a/s =>■ v/s' and the corresponding 
typing judgment E h a : t, there exists a store typing S such that S \= e : E and \= s : S. 

Using the Soundness Theorem 3.16rthe second condition in the above definition immediately 
allows us to conclude that the computed value v is consistent with the type r under a suitably 
constructed new store typing. In additionrthe first condition ensures that this is the most 
general type of the value v. Thereforerthis type r is taken to be the exact reconstructed type 
of the value v. 



149 



7.4.4 The Type Reconstruction Algorithm 

The reconstruction algorithm shown in Figure 7.5 reconstructs one activation frame at a timer 
although it may be applied to each frame within the current dynamic activation tree to re- 
construct the whole state of the machine. The actual order in which frames are reconstructed 
is not importantrnor is the fact that we cache the reconstructed type-maps. ThereforeFwe 
will assume that all frames in the dynamic activation tree are reconstructed in one sweep that 
starts at the root frame and works its way downwards towards the leaf frames. This does not 
change the correctness problem because we are interested in showing the correctness of what 
the algorithm computesFnot necessarily how it computes it. 

As shown earlierrwe have modeled the dynamic activation tree as a partial execution tree 
(Definition 7.1)Tand the process of type reconstruction as constructing a type derivation tree 
for it (Definition 7.2). ThereforeFall we need to do now is to show that our type reconstruc- 
tion algorithm given in Figure 7.5 indeed constructs a validrmost general type derivation tree 
according to Definition 7.2. To accomplish thisFwe need to abstract the reconstruction algo- 
rithm in terms of traversing the partial execution tree and constructing the corresponding type 
derivation tree. 

The first observation to be made about the reconstruction algorithm shown in Figure 7.5 
is that it reconstructs an entire frame at a time by instantiating the static type-map of the 
function corresponding to that frame. The static type-map of a function corresponds to the 
most generairstatic type derivation tree of its body. This is because the static type-map records 
the compile-time type of every sub-expression and free identifiers occurring within the body of 
the function (Definition 6.1). Furthermorerthese types are computed using the type inference 
algorithm Infer mentioned in Section 3.4. The soundness of this algorithm (Proposition 3.19) 
ensures that we can construct a valid type derivation tree for the entire body of the functionr 
while its completeness (Proposition 3.20) ensures that we obtain the most general type for each 
sub-expression. 

Thusrinstantiating the static type-map of a function with a substitution can be viewed as 
instantiating the entire static type derivation tree of the function body with that substitution. 
The validity of the derivation tree after substitution is ensured by the stability of typing judg- 
ments under substitution (Proposition 3.9). Note that the structure of the instantiated type 
derivation tree matches the portion of the partial execution tree that corresponds to the activa- 
tion frame being reconstructed. Sub-trees that are completely evaluated and hence have been 
collapsed to a leaf in the partial execution tree may also be collapsed in the typing derivation 
tree. 

The second observation about the reconstruction algorithm is regarding the construction 
of the instantiating substitution Sdef-useShintS CO py for the callee's type-map. The purpose of 
this substitution is to fully instantiate the static type-map of the callee according to the types 
of its actual arguments and the resultrso that the corresponding type derivation tree for the 
callee's body matches the application site in the caller's derivation tree. The two independent 
components 10 Sdef-use an d Skint are responsible for two different sets of arguments supplied to the 
callee. The substitution Sdef-use conveys the type instantiation information due to the arguments 
and the result present at the final application siter while the substitution Skint provides the 
instantiation information due to the arguments supplied at previous partial application sites. 

The compiler support for type-hint generation and propagation (Section 7.2) provides the 
mechanism by which we make the relevant type information available at the final application 



The third component S C o-py simply serves to make a copy of the type-map and therefore does not provide 
any additional information. 



150 



site. The most important property of this mechanism is type conservation (Definition 6.2) 
which ensures that the exact type instantiation for every type- variable within a function's 
type-map is preserved at each of its application sites. For non-conserved type- variablesr the 
type-hint generation and propagation phase described in Section 7.2.3 encodes their dynamic 
type instantiations at each partial application site and stores them within the returned closure. 
This ensures that these type instantiations would remain accessible in encoded form even when 
the computation that produced the closure has terminated. The substitution Skint during type 
reconstruction represents these type-variable instantiations. For conserved type- variablesr the 
type of the arguments present at the final application site within the type derivation tree of 
the caller provides their exact instantiation. The substitution component Sdef-v.se captures 
these instantiations. Type conservation (Definition 6.2) guarantees that together these two 
substitutions fully instantiate all the type-variables present within the callee's type-map in 
accordance with the types of the actual arguments and the result of the function application. 
As discussed aboverthe reconstruction algorithm ensures that the instantiated type deriva- 
tion tree computed for the callee's body matches its application site within the type derivation 
tree of the caller's body. This process effectively "glues" the instantiated type derivation tree 
of the callee's body at the APP rule within the caller's type derivation treeFproducing a single 
typing derivation tree that structurally corresponds to the partial execution tree across this ap- 
plication. Belowrwe formalize the construction of the type derivation tree in the above manner 
and show its consistency with respect to the current partial execution tree. 

7.4.5 Correctness of the Algorithm 

We model the entire computation including the initial program loading/linking phase using a 
partial execution tree given by Definition 7.1. The program loading and linking phase construct 
the static and the dynamic environment within which the top-level query expression is evaluated. 
This is not part of the real reconstruction process because it is performed before initiating the 
execution of the top-level query expression. Butrin our theoretical formulation it is simpler 
to start with empty static and dynamic environmentsFand an empty store that are consistent 
with each other by definition. 

Each loading/linking step adds a new let-binding to the partial execution tree and the 
type derivation treer adding its type and value to the static and the dynamic environments 
respectively. Since each binding is well typednt follows from the Soundness Theorem 3.16 that 
we end up in a store typing So such that each top-level binding value is consistent with its 
corresponding typeFand that the static environment E^rthe dynamic environment eoTand the 
store so obtained after loading/linking are also consistent: 

So |= e : E and |= s : S (7.1) 

Now we are ready to show that the reconstruction algorithm of Figure 7.5 is correctri.e.r 
given a logical partial execution tree as given by Definition 7.1Tit computes the corresponding 
logical type derivation tree as given by Definition 7.2. 

Theorem 7.3 (Correctness of Type Reconstruction) The reconstruction algorithm shown 
in Figure 7.5, when applied to the complete dynamic activation tree at any time during program 
execution, produces the exact types for every value computed until that time. 

Proof: by induction on the size of the partial execution tree (Definition 7.1). Since the top- 
level query is guaranteed to be well-typedrwe start with its type derivation tree. Looking 



151 



at the static inference rules shown in Figure 3.2 and the dynamic inference rules shown in 
Figure 3.1Tit is clear that the structure of the type derivation tree must correspond to the 
partial execution tree except possibly at the APP or the ABS rules where the number of 
judgment antecedents differ between the static and the dynamic rules. We recurse down the 
partial execution tree and the current type derivation tree simultaneously in a depth-firstr 
leftmost-first mannerFarguing by case analysis on the inference rules that lead to completed 
evaluation judgments. 

Case 1: Rules other than ABS or APP — Equation 7.1 shows that we start with a consistent 
set of environments and an initial store. For each sub-expression that has been evaluated 
in sequencerthe Soundness Theorem 3.16 guarantees that its value V{ present in the partial 
execution tree would be consistent with the corresponding type r 8 - present in the type 
derivation tree. FurthermoreFwe can construct a chain of extensions to the initial store 
typingrS'i extending S{-\ extending ••• SoTeach of which would be consistent with the 
corresponding store s 8 -, s 8 _i, . . . , so- If an y of these intermediate values entered the dynamic 
environment through a let-bindingr then the static environment E{ and the dynamic 
environment e 8 - so obtained would also be consistent by construction. Thereforerfor each 
sub-expression evaluation judgment we haver 

Si \= Vi : Ti Si \= e; : E % |= s % : S % (7.2) 

Case 2: ABS Rule — HereFwe simply clip the type derivation tree at the abstraction typing 
judgment in order to emulate the structure of the partial execution tree which produces a 
function closure immediately. The type-correctness of the function body ensures a consis- 
tent static type for the closure by definition of |= (Definition 3.12). 

Case 3: APP Rule — This is the interesting case of type reconstruction. By induction hy- 
pothesisr the function and the argument expressions evaluate to a closure and a value 
respectively that are consistent with their types present in the type derivation tree. Fur- 
thermorersuppose the base function / present within the closure 11 has arity k with formal 
parameters x\ ■ ■ -x^. We need to consider two cases — partial application of the closure to 
one more argumentrand the final application of the closure that generates a new activation 
frame. 

If the current application is a partial application of a closure (clsr f\ x^-i+i, a/, e , -8 ) to 
the value Vk-i+iTihen it immediately produces another closure (clsr / 8_1 , Xk-i+2, a fi e f~ t+ )T 
where e f ~ t+ = e ,~ l -\-{xk-i+\ >— > ffc-i+i}- The type consistency of this value with respect 
to the result closure type recorded in the type derivation tree follows directly from induc- 
tion hypothesis. The important point to note is that if some type-variable in the resulting 
closure type was not being conserved at this application siterthen its exact type-hint would 

also have been supplied at this application site and stored within the closure environment 

k-i+i 
e f 

Now suppose the function has already undergone k — 1 partial applications before this 

application to produce a function closure (clsr f 1 1 Xk 1 af 1 e f ~ ). Thereforerthe dynamic 



The simple expression language of Chapter 3 does not deal with multi-arity functions directly. Therefore, 
we assume that each multi-arity function / with arity k in the user program gives rise to a set of functions 
/ , / -1 , . . . , f 1 that represent partially applied closures of / accumulating one argument at a time within their 
environments e f ■ ■ ■ e ,~ . The superscript i on the function f denotes how many more arguments are needed 
before the evaluation of the body of the function / is initiated. Likewise, the superscript j on the environment 
ej denotes the number of arguments it has accumulated. 



152 



APP rule in the caller's partial execution tree looks liker 

d h ai/si => (clsr f 1 , x k , a f , e k f 1 )/s t+1 

e t h a 2 /s t+ i => v k /s t+2 

e k f l + {x k i-> v k , J i-> (clsr J fc , a?i, a/, e^)} h a//s» +2 => -/- 

e; I" («i «2)/si => -/- 

While the static APP rule in the caller's type derivation tree constructed so far looks liker 

Ej h ai : T k -> r fc+ i ff 8 - h a 2 : r fc 

We wish to construct an appropriate type derivation sub-tree that models the evaluation 
of the callee's body. 

From the induction hypothesis on the first two clauses and the Soundness Theorem 3.16r 
we obtain new store typings Si+i and Si+2 such thatr 

S t+1 \= (clsr f 1 , x k , a f , e k ~ l ) : T k -+ T k+1 S t+1 extends S t \= s i+1 : S t+1 (7.3) 

Si +2 \= v k : T k Si +2 extends S t+1 \= s i+2 : S t+2 (7.4) 

Looking at the definition of |= (Definition 3.12)Tthe first clause of Equation 7.3 guaran- 
tees that there exists a suitable type environment E ,~ that is consistent with the closure 

environment e,~ and provides a proper typing for the function body. That isr 

Si+i \= e)~ l :E k f l (7.5) 

and E k ~ x h (f 1 where f(x t ■■■x k ) = a ) : T k -> T k+1 

=> E k ~ x + {x k ^ T k } h a : T k+1 (7.6) 

The job of the reconstruction algorithm is to construct the type environment E ,~ and 
hence build the exact type derivation tree of the function body as given by Equation 7.6. 

At compile-timer the static type-map of the function TM f has already recorded the 
static type of all the parameters and free identifiers of the function / (Definition 6.1). The 
reconstruction algorithm simply needs to instantiate this compile-time type environment 
^static -(- com p U te the actual type environment E ,~ as discussed in Section 7.4.4 above. 
In particularr the algorithm uses the exact type T k of the final argument x k from the 
application site as well as type-hints contained within the closure environment e,~ that 
allow it to compute the exact types of all the non-conserved type-variables in the type- 
map TM f. This completely instantiates the types of all the accumulated arguments x\ : 
7~i, . . . , x k _\ : T k _\ and the free identifiers contained within the closure environment e ,~ . 

Having constructed the type environment E ,~ as aboveFwe can now instantiate the 
type derivation tree of the body af as shown in Equation 7.6. Now it remains to be shown 
that this type derivation tree is consistent with the evaluation tree of the function body 
a f . 

We have the following environmentsr 

e) = e k f l + {x k ^ v k } E) = E k ~ l + {x k ^ T k } 

Note that all argument and free identifier values contained within the closure en- 
vironment e r must be consistent with the type present within the instantiated type 



153 



environment E ,~ under the store typing Si+iT i.e.rthe constructed environment E ,~ 
satisfies Equation 7.5. This is because these values have been computed in the earlier 
part of the evaluation tree which we have already type reconstructed and verified for 
consistency (Equation 7.2). Since the current store typing Si +2 extends Sj+iTwe have 
S{+2 \= e f~ '■ E ,~ which is combined with Equation 7.3 and Equation 7.4 to give 
S{+2 \= e / : EH. Together with Equation 7.4 and Equation 7.6rwe obtain via the Sound- 
ness Theorem 3.16 that the evaluation of the function body aj will be consistent with its 
type elaboration. 

Thusrwe have successfully reconstructed a consistent type derivation tree shown in 
Equation 7.6 for the expansion of the partial execution tree due to an arity-satisfied function 
application within the current frame. 

□ 



154 



Chapter 8 

Application Study: Tagless 
Garbage Collection 



In this chapter we study an important application of type reconstruction: Tagless Garbage 
Collection. We describe the compile-time and run-time support needed to perform garbage 
collection for a polymorphic language without any type-tags. We have implemented our scheme 
for the Id language running on a simulator for the *T multi-processor architecture. We describe 
this implementation and compare its performance with two other storage management schemes: 
firstra conservative garbage collector that does not use any type informationFand secondra 
compiler-directed storage reclamation scheme that explicitly deallocates objects based on static 
life-time analysis. 

8.1 Introduction 

Dynamic memory management is an integral component of modern programming languages 
such as Cr Common Lispl" Standard MLrand Haskell that support the notion of a globally 
shared heap of objects. It is possible to manage the heap memory manually by means of 
explicit allocation and deallocation callsrthough manual storage reclamation is often a difficult 
and error-prone process. Usuallyrit is more convenient to use some automatic mechanism for 
storage reclamation such as an independent garbage collector that reclaims storage periodically 
once it is no longer in use. 

Traditionallyl" run-time systems geared towards automatic garbage collection use a tagged 
object representation model [App90rWil92]. This enables the garbage collector to distinguish 
between scalar objects and pointers to heap objects without any support from the user or the 
compilerralthough the user application has to pay the price of tagging and boxing objects and 
performing continuous tag maintenance. 

Recentlyr storage reclamation techniques with an untagged object representation model 
have received much attention. The motivation comes from a desire to use the full pointer 
addressability and native representation for scalars rather than a tagged representationFand 
to avoid the overhead of continuous tag maintenance. Some techniquesFsuch as conservative 
garbage collection [Bar88rBW88] and compiler-directed storage reclamation [HJ92rHic93]rdo 
not use any run-time type information. Whiler garbage collection based on type reconstruc- 
tion [App89rGol9irGG92] or explicit type propagation [Tol94] use source type information for 
identifying and traversing live heap objects. In this chapterFwe will study and compare the 
performance of some of these techniques with a scheme based on full run-time type reconstruc- 



155 



tion. 

8.1.1 Storage Reclamation without Run-time Type Information 

In an untagged run-time systemFno explicit type information is available at run-time in order 
to identify and traverse live objects. StillT it is possible to perform garbage collection using 
a conservative object identification strategy as shown by Boehm and Weiser [BW88]. In this 
schemerthe garbage collector guesses whether a given value is a scalar or a pointer to a heap 
object. Typicallyrthe guess is based on certain assumptions about the location and alignment 
of actual pointer data. Since the guess is conservativerthe garbage collector may assume some 
objects to be live when they are dead and fail to collect them. It may also be possible to 
compact or copy part of the live data that is definitely known to reside on the heap as shown by 
Bartlett [Bar88]. The feasibility and efficiency of such schemes depend crucially on the object 
representation convention used within the run-time system and the possibility of obscuring 
pointer/non-pointer information within the source language and the compiler. 

In another scheme proposed by Hicks [HJ92rHic93]rthe compiler performs life-time analysis 
of objects and automatically inserts explicit deallocation calls for an object that is determined 
to be dead at a particular point in the program. The compile-time cost of this analysis is 
substantial since the proposed scheme performs abstract interpretation over the entire program 
in order to determine the reference patterns of dynamically allocated objects and to approximate 
their life-times statically. Althoughronce an object has been determined to be garbagerthe run- 
time cost of deallocating it at an appropriate program point is minimal. Since static analysis 
is necessarily approximate due to undetermined control flow and sharing or aliasing of objectsr 
this technique is also unable to reclaim all the garbage generated within the program. 

8.1.2 Garbage Collection using Run-time Type Reconstruction 

The primary motivation for a type-reconstruction-based garbage collection scheme is to take 
advantage of the enormous compile-time type information available in a statically-typed lan- 
guage in optimizing its run-time performance. In particularrit is possible in such a system to 
use an untagged and unboxed representation for scalar objects and eliminate type headers for 
heap objects without compromising the ability to perform complete object identification. All 
the desired type information may be automatically reconstructed when necessary. Although 
the cost of type reconstruction may be significantrit needs to be paid only when garbage col- 
lection is initiated. ThereforeFsuch a scheme may work very well for scientific applications 
where numerical performance is of prime concern and garbage collection is expected to happen 
infrequently and is used in conjunction with explicit storage management. Keeping tagless 
data also permits easy inter-operability with conventional C and Fortran libraries that do not 
support tags. 

Full run-time type reconstruction also offers some unique advantages that are not present 
in other schemes for storage reclamation. Having the exact run-time types of objects allows the 
garbage collector to examine and traverse objects selectively. For examplerthe collector need 
not search for heap pointers inside a large array of floating point numbers. Similarlyrthe scalar 
fields of a record may be safely skipped. For scientific applications manipulating large numeric 
arraysrthis may constitute a substantial saving in identifying the set of all live objects. 

It is also quite easy in this scheme to generate specialized traversal and marking functions 
for user-defined objects and function activation frames that understand their type and control 
structure. These functions selectively traverse the fields that point to heap objects as determined 



156 



by their typesFand mark those objects as live. Since these functions are specialized to the type 
of a particular objectrthey may be more efficient than interpreting the run-time reconstructed 
types of the objects. 

8.1.3 Related Work 

Goldberg and Gloger used type reconstruction to garbage collect a polymorphic language 
[GG92]. But their system did not guarantee complete type reconstruction. In a situation 
where a polymorphic function accessed only part of a complex object (see Section 6.1.2)Tsay 
the spine of a linked listr their system could not determine the full type of the object and 
therefore could not traverse it completely. The authors argued that the inaccessible parts of 
the object were garbage anyway and therefore need not be marked as live. Unfortunatelyr 
the object could have shared references from other sources that access it farther than the first 
reference. To deal with such casesrthe authors proposed maintaining hash tables of partially 
traversed data-structures as a way of identifying the extent to which an object was live and 
therefore should not be garbage collected. This scheme was both cumbersome and costly. On 
the other handrour scheme of full type reconstruction allows the garbage collector to traverse 
the whole object the very first time without using any additional data-structures. 

Another interesting scheme has been proposed by Tolmach [Tol94] where type instantiation 
and propagation is made explicit in the program by converting it into an intermediate form based 
on the second-order A-calculus [Rey74rHM93]. Under this transformationFevery polymorphic 
object is parameterized with explicit type parameters for each of its polymorphic type- variables 
that are instantiated at the time of application to actual type arguments. This explicit run-time 
type information is used during garbage collection in much the same way as in our scheme. A 
minor problem in using this scheme is that in order to preserve the call-by-value semantics of 
ML-like programsrthe polymorphic objects appearing on the right-hand-side of a let-binding 
must be restricted to syntactic valuesY i.e.ridentifiersr constantsFor A-expressions. Wright 
showed [Wri93] that this restriction is not too serious in practice. 

The explicit type parameters used in Tolmach 's system are similar in spirit to the explicit 
type-hints of our type reconstruction schemer although we add explicit type parameters only 
for non-conserved type-variables. Our scheme can be considered as an optimal trade-off point 
between Goldberg's scheme where no explicit type information is propagated at run-timeFand 
Tolmach 's scheme where all polymorphic type- variables are instantiated using explicit run-time 
parameters. We insert explicit type parameters only where necessary assuming that the cost of 
reconstructing the remaining information at run-time is small. 

8.1.4 Goals and Scope of the Study 

The main goal of this study is to establish the feasibility of a type-reconstruction based tagless 
garbage collection scheme (TRGC) and to compare its performance with a conservative garbage 
collection scheme (CGC) and a compiler-directed storage reclamation scheme (CDSR) that does 
explicit deallocation. 

In order to make a reasonable performance comparisonFwe have implemented all the three 
schemes for the same source languageFcompilerFand the target architecture. Our source lan- 
guage is Idrwhich is a polymorphicFstrongly-typedrimplicitly parallel programming language 
[Nik91]. We are compiling Id for the *T multiprocessor architecture [NPA92FPBGB93] and 
executing it on an emulator for that machine. 

We have chosen a very simple "mark-and-sweep" garbage collection algorithm so that the 



157 



cost of object identification can be clearly identified during the mark phase. The wall clock 
performance of the garbage collection algorithm is not our major concernFwe are primarily 
interested in the relative cost of type reconstruction and marking vs. the cost of conservative 
marking. Explicit allocation/deallocation scheme serves as a calibration point representing the 
essential cost of managing the storage. 

8.1.5 Outline 

The outline of the rest of the chapter is as follows. Section 8.2 describes the object represen- 
tation model in Id and summarizes the overall strategy for mark-and-sweep garbage collection 
based on run-time type reconstruction. Section 8.3 describes the compiler support required. 
In Section 8.4Twe describe the run-time object marking schema based on complete type re- 
construction. In Section 8.5rwe briefly describe the *T multi-threaded architecture and our 
implementation of the various storage management schemes on it. Section 8.6 discusses our 
benchmarks and presents the performance results. FinallyrSection 8.7 presents the conclusions. 

8.2 Framework for Tagless Garbage Collection 

8.2.1 Object Representations and the Memory Model 

The Kernel Id intermediate language as shown in Figure 7.1 is an abstract intermediate form 
that does not take a position on the underlying representation of objects. HoweverFa concrete 
implementation of a language must specify a representation of objectsFwhich to a large extentr 
determines its run-time performance and the garbage collection strategy. In this sectionFwe 
describe the concrete representation of Id objects for our current implementation. 

The object representation used in the Id run-time system is independent of the target 
architecture and only relies upon the assumption of a logically flatrsharedrglobal address space. 
In order to keep the representation simple and efficient we avoid making any assumptions about 
boxing and explicit tagging of objects as much as possible. The only assumption necessary to 
support polymorphism is that we use the same basic unit of memory for all scalar objects and 
pointers to heap objects which in our case is a single 64-bit word. 

Examples of various Id object representations appear in Figure 8.1. Scalar objects are by 
definition untagged and unboxed in Id. ra-dimensional arrays are linearized in row-major order 
into a flat data-structure that also keeps the bounds in each dimension (l\, u\), ...,(/„, u n ) 
and a set of linearization constants Co, . . . , c n _i that are used to compute the linear offset into 
the array given a ra-dimensional index. For an algebraic datatyper depending on the total 
number m and the arity k m of its various disjunctsFwe may choose one of productT enumeratedT 
implicitTor explicit representation. In all cases except when there are more than one non- 
nullary disjuncts presentrwe are able to choose an unboxed and untagged representation for 
the datatype. In particularFwhen there is exactly one non-nullary disjunct presentras in the 
case of the list datatypeFwe assume that heap pointers can be distinguished from a small fixed 
range of integers (sayr 0-255) rsufficient to represent all the nullary disjuncts of the datatype 
and no explicit tag is necessary. For some applicationsrthis may save a lot of space and time. 

There are two more kinds of objects that are created and manipulated indirectly at run- 
time by Id programs. These are function closures and activation frames. In an implementation 
without lambda-lifting and curryingr function closures keep the values of the free identifiers 
of a function obtained from its lexical environment. In our implementationFall functions are 
already lambda-liftedr so the closures carry just the curried arguments accumulated under 



158 



Scalar: 68 4 7 3.14 (Unboxed and Untagged) 

Bounds Elements 
2d_array: |co|ci|n|ui|i2|u2| | - - - | ~| (Linearized) 

Algebraic Type: 

Product: type point = Pt int int; 

(1 disjunct) 



Enumerated: type bool = False | True; 
(All nullary disjuncts) 1 

Implicit: type list *0 = Nil | Cons *0 (list *0); 

(1 non-nullary disjunct) 







1 



2 



Explicit: type token = Eof | Tkl int | Tk2 float; 

(> 1 non-nullary disjuncts) 

Function Closure: 

def F xl ... xn = E; 
(F xl . . .xk) 



F 




F 




F 


n-k 


n-1 


n 


xk 


xl 








X 



Activation Frame: 

Size Args Locals 



F 






xl 




xn 









Return Cont. 



Figure 8.1: Run-time Object Representations for Id. 



partial applications. We use the structure depicted in Figure 8.1 which permits sharing of 
intermediate closures. 

An activation frame is a temporary storage area used by an executing function as a scratch 
pad keeping its input arguments and temporary intermediate values. In Kernel Idrthe bound 
variables of a function constitute the intermediate values that need to be kept within its acti- 
vation frame for future use. 1 The frame also keeps the return continuationFconsisting of the 
caller's activation frame and the return instruction pointer. In a sequential systemFactivation 
frames are usually allocated on a stack. In our parallel execution modeirthe linear stack of 
activation frames generalizes to a tree and is managed explicitly by the run-time system. 



1 An intelligent compiler back-end may be able to share some frame slots based on live-variable analysis, but 
we are ignoring that issue here for simplicity. 



159 



8.2.2 Overall Strategy 

The overall strategy for a simple mark-and-sweep garbage collection based on run-time type 
reconstruction is summarized below and described in the following sections: 

1. At compile-timerwe ensure that every object manipulated by the user program (including 
function closures and activation frames) is assigned a staticFpossibly polymorphicrdatatype 
that accurately describes the structure of that object (Section 8.3). 

2. When the garbage collector is invoked at run-timer first we reconstruct the type of every 
activation frame present within the current dynamic call tree using the algorithm described 
in the last chapter. The reconstruction mechanism instantiates the compile-time type de- 
scription of each activation frame to its exact run-time type. 

3. Nextr within the mark phase of the garbage collectorFeach slot of a reconstructed frame 
is examined and its reconstructed type is used to mark the heap objects reachable from 
that slot as live. This may be done in two ways: the reconstructed types may be directly 
interpreted to identify and traverse the heap objectsFor the compiler may automatically 
generate specialized traversal and mark routines that are appropriately composed at run- 
time in order to mark the live objects (Section 8.4). 

4. Finallyrthe unmarked heap objects are reclaimed as garbage by sweeping the entire heap. 

8.3 Compiler Support for Object Identification 

8.3.1 Visible and Invisible Datatypes 

The scalar basetypesFalgebraic datatypesFand array types in Kernel Id correspond to pure data- 
objects whose types are directly visible at the source language level. There is a directrfixed 
mapping from the source types of these objects to their internal representations as described in 
Section 8.2.1. This mapping may be directly used in traversing these objects at run-time once 
their exact source type is determined. 

On the other handrarrow types (— >) correspond to two different run-time objects: function 
closures which behave like data-objects that must be garbage collectedrand activation frames 
which are control-objects consisting of the live object root set. Neither of these is modeled 
completely by the source-level arrow type. This is because the visible type signature of a 
function does not provide any clue regarding the types of the arguments hidden inside its 
closureFnor does it provide any information about the local variables kept within the function's 
activation frame. In order to treat all Id run-time objects uniformly in terms of Id source 
typesFwe define invisible source-level datatypes for function closures and activation frames 
that provide an exact description of their contents. 

8.3.2 Modeling Function Closures 

In order to simplify the type reconstruction analysisFwe model the closures corresponding to 
partial applications of a function as disjuncts of an invisible algebraic datatype that is auto- 
matically derived at compile-time from the corresponding function signature. This derivation is 
shown in Figure 8.2. The various disjuncts of this hidden datatype represent successive partial 
applications of the function and identify the number and the types of the accumulated argu- 
ments. This indirect model captures all the necessary type information required to traverse the 
actual run-time representation of a function closure as shown in Figure 8.1. Given a run-time 
closure objectrwe can map it to an algebraic disjunct in this model by examining its function 



160 



code-block pointer and the remaining arity slot. ThenFgiven the exact algebraic type of the 
closurerthe arguments contained within the closure can be traversed using the argument types 
of the mapped disjunct. 

As an examplerbelow we show a function eqlen that compares the length of two lists. We 
also show its Hindley/Milner visible source type and its automatically derived hidden closure 
datatype: 

Example 8.1: 

def eqlen 11 12 = °/, eqlen :: Ma (3. (list a) — > (list (3) — > bool 

{ lenl = length 11 
len2 = length 12 
p = lenl == len2 
in p }; 

type eqlen_closure a (3 = °/, Hidden Closure Type 

eqlen_apo 
I eqlen_api (list a) ; 

f = eqlen (l:nil); '/„ f wM (3. (list (3) ^ bool 

7, f :: V/3.(eqlen_closure int (3) 

The constructor eqlen_apo models the closure representation of the eqlen function itsehT 
while eqlen_api represents the closure formed by a partial application of the eqlen function 
to one argument. The example also shows the source type and the invisible type of a partial 
application of the eqlen function. 2 Note that the invisible type records the fact that the hidden 
first argument within the closure is a list of integers while this information is not present in the 
source type. 

There is no need to make a closure for eqlen with two arguments since at that point its 
arity is fully satisfied and the application gives rise to an activation frame instead of a function 
closure. 3 Finallyr note that the invisible closure datatype is parameterized by all the type- 
variables present in the source type of the function. This is necessary in order to model the 
exact run-time types of all the arguments contained within the closure. 

8.3.3 Modeling Activation Frames 

Function activation frames are modeled using an automatically derivedrinvisible datatype 
called the function framemap as shown in Figure 8.2. This is simply a record datatype with 
a field for every actual frame-slot (c.f. Figure 8.1). Besides the scalar datatype fields for the 
code-block entry pointrthe frame size and the return continuationrthe framemap record the 
types of the function arguments and the local identifiers used within the function body. 

Abstracter the framemap of a function provides a logical subset of the type information 
recorded within its type-map (Definition 6.1) and is parameterized by the same type-variables. 
The framemap simply provides a concrete static image of a function's dynamic activation frame 
and therefore may depend on its actual implementation on a given platform. After type recon- 
struction is completer each activation frame is associated with a fully instantiated type-map 
from which an appropriate framemap instance can be derived in order to traverse the heap 
objects accessible through each frame-slot. 4 



is the infix cons constructor for fists, 
lo 
fn our current impfementation, the type-map produced by the fd compifer is taifored to the structure of 



However, under defayed or fazy evafuation, we may need to keep track of such thanks. 



161 



Invisible Datatypes 

Given a Function Declaration: def F x\ ■ ■ -x n = E 

F :: Vqji • • • a m .T X -> ► r n -> r n+ i 

Let (^i :: (Ti) • • • (z m :: <7 m ) be the locally bound identifiers of E. 

1. Define Function Closure Datatype: 

type F_closure ot\ ■ ■ -a m = ^- a Po 

I ^-api Ti 

I F_ap n-1 ri • • -r n _i; 

2. Define Function Framemap Datatype: 

type F_framemap ot\ ■ ■ -a m = 

{record (F :: code) °/, Code-Block Entry Point 

(N :: mi) '/, Frame Size 

(R :: cont) °/, Return Continuation 

(si :: T\) '/, Arguments 



(^i :: o-i) 



'/, Local Identifiers 



^ttj, •• @m) J } 



Figure 8.2: Automatic Derivation of Invisible Datatypes. 



As an exampleLwe show the framemap datatype for the eqlen function given above: 

Example 8.2: 

type eqlen_typemap a (3 = 
{record 

(eqlen : : code) 

(size : : int) 

(retcont : : cont) 

(11 :: (list a)) 

(12 :: (list fi)) 

(lenl : : int) 

(len2 : : int) 

(p : : bool) }; 



8.3.4 Run-time Type Encodings 

Run-time type reconstruction requires an encoding of all the visible and invisible datatypes of a 
program that is used to encode type-hints and to represent the exact run-time types of objects 
during type reconstruction. We showed such an encoding and decoding scheme in Figure 7.3 in 
Chapter 7. In this schemeLeach algebraic datatype F n is encoded into a corresponding static 



the activation frames used in the *T run-time system. Therefore, we directly use the type-map of a function to 
traverse its activation frame. 



162 



type descriptor T n that contains all the necessary compiler information about its arityHnternal 
field structurerand its representation. 

Our compiler generates static type descriptors for all the user-defined algebraic datatypes 
and the automatically derived closure and framemap datatypes (Figure 8.2) for each declared 
function within the program. These static descriptors are linked together with the object 
program and are used by the run-time system during type reconstruction. Run-time types 
are encoded as a flat array of static type descriptors using back-pointers to preserve sharing. 
This representation permits very efficient copyingFunificationFand instantiation operations on 
encoded types. The packing and unpacking of these encoded types is carried out on the fly 
within the run-time system. 

8.4 Run-time Object Traversal and Marking 

In this sectionrwe describe our scheme for object traversal and marking assuming complete 
type reconstruction has been performed. We present two mechanisms: 

Interpreted Marking - In this mechanismrthe encoded types generated by type reconstruc- 
tion are directly used to guide the traversal and marking of the heap objects. 

Compiled Marking - In this mechanismrthe compiler automatically generates marking func- 
tions for each datatype in the program based solely on the static type information. These 
functions are appropriately composed at run-time using the reconstructed types and then 
directly applied to the corresponding objects. 

Both mechanisms are specified as a set of mark functions!" one for each basetypel" array typel" 
and algebraic datatype present in the program. The algebraic datatype could be a user-defined 
datatype (Figure 7.1) or an invisible datatype defined by the compiler for function closures and 
activation frames (Figure 8.2). 

8.4.1 Interpreted Marking 

The Interpreted Marking Schema M. for a type T n is shown in Figure 8.3. In this schemal" 
for each type T n with n type parameters a\ ■ ■ ■ a n Twe define a mark function mark_T that is 
parameterized by n corresponding encoded type arguments z\ ■ ■ ■ z n . At run-timerthis function 
is supplied with the exact encoded type instantiation of its type parametersFsay rf • • -TvTFwhich 
produces an appropriate marking function for an object with type (T n T\ ■ ■ -T n ). 

The internal structure of the mark functions closely follows the structure of their corre- 
sponding datatypes. The polymorphicr bound type- variables of a type-scheme are mapped to 
dummy mark functions because polymorphic objects contain no information. SimilarlyFall our 
base types are scalarsFso the mark functions for them do nothing. The mark function for ar- 
rays and algebraic datatypes first mark the object itself and then proceed to mark their internal 
components. This is achieved by first computing the exact run-time type encoding for each of 
the components and then interpreting that encoding. The code to compute the exact type 
encoding is directly compiled into the mark functions using the TEnc[] scheme shown earlier 
(Figure 7.3). 

The overall process of interpretive marking is governed by the top-level type-code interpre- 
tation function shown in Figure 8.4. HereFwe have generalized the type-code interpretation 
scheme Interpret |J for an arbitrary datatype schema 1Z such as the marking schema M. of 
Figure 8.3. This process unpacks the encoded type and invokes the schema function for the 
appropriate type descriptor passing it the rest of the encoded type arguments. In the present 



163 



Marking Schema M 

Given a polymorphic type-variable a 8 Tdefine A4 [T° .] = mark_T a Twhere 
def mark_T aj () = Xx.Q 

Given a Type TTdefme M{T n ] = mark_Trwhere 

1. T° is a BaseType (int \ float): 

def markJT () = Xx.Q 

2. T 1 is an Array Type (nd_array a): 

def mark_n d_array (z) = 
Xa.{ Mark(a); 

(/i,mi), . . ., (l ni u n ) = bounds(a); 
for i\ <— l\ to u\ do 

for i n <— /„ to u n do 

Interpret|yVf] (TEnc|a] {a H> 2:}) a[ii, . . . , i n ]; 
} 

3. T n is an Algebraic DataType (T n ot\- ■ -a n ): 

def markJT (zi, ... , z n ) = 
As.{ Mark(x); 

Case_T a; of 

Ci «i • • -x kl = { Interpret^] (TEnc[r n ] {a t ^ z t }) x t ; 

Interpret^] (TEnc[[ri fcl ] {a t ^ z t }) x kl ; } 
I C m x x ■ ■ ■ x km = { Interpret^] (TEnc[r m i] {a t i-» z t }) x t ; 

Interpret^] (TEnc[r mim ] {a, ^ z % }) x km ; } 



Figure 8.3: Generating Mark Functions for Datatypes. 



Given a Datatype Schema 7vTdefine 

Interpret [7^.] r = { Case head(r) of 

Tl_={K{Tn)args n {T) 

I T 2 m = (UlT 2 m ]) args m {r) 

I •••} 
Figure 8.4: Type-code Interpretation at Run-time. 



164 



Given a Datatype Schema 1Z and a Translation Environment r^Tdefine 
Compile^] T R a = T R (a) 

Compile^] T R (T n n ---T n ) = (UlT n j) (Compile^] I* n, . . . , Compile^] T R r n ) 

Compile^] T R V<*i ■■■a n .T = Compile^] Y R (r[T^/a t ]) 

Figure 8.5: Type-based Translation at Compile-time. 

caser (Interpret [A4] T x) traverses and marks the object x according to its exact run-time 
type encoding r by recursively instantiating and invoking the mark functions associated with 
the type descriptors in r. Other structured datatype schema such as a printing schema or an 
I/O schema may also be defined and interpreted in a similar manner. 

In our current implementationrthe type-code interpretation mechanism of Figure 8.4 is built 
into the run-time system. The marking process is invoked for each type-reconstructed activation 
frame present in the dynamic activation tree. The run-time system constructs the exact run- 
time type encoding of every frame-slot in the given activation frame and then directly dispatches 
to the appropriate marking function based on the datatype class as specified in Figure 8.3. The 
marking process is further optimized based on the actual representation chosen for a particular 
class of datatypes as shown in Figure 8.1. For examplerthe marking function for linearized 
arrays computes the total size of the array and marks each of its elements in a single loop. In 
case of algebraic typesFnullary disjuncts under enumerated or implicit representation are never 
markedra product disjunct is always markedrand a tag dispatch is made for explicitly tagged 
disjuncts. Finallyl" the hidden arguments inside function closures are traversed and marked 
according to their reconstructed hidden closure types. 

8.4.2 Compiled Marking 

Rather than interpreting type encodings as in the interpreted marking schemarit is also pos- 
sible to generate compiled marking functions for each datatype that know how to traverse the 
object directly without any type interpretation. In this Compiled Marking Schema Ai'Yiov each 
datatype T n the compiler automatically generates a mark function mark'_T that is parameter- 
ized by n mark function arguments f\- ■ ■ f n instead of encoded type arguments. This alternate 
marking schema A4' can be directly obtained from our interpreted marking schema M. shown 
in Figure 8.3 by replacing the recursive call for interpretation: 

Interpret [A-f] (TEnc[r] {a,- ^ z % }) 

by a type-based function composition: 

(Compile^'] H ^ /,-} r) 

This transformation expresses the fact that building the exact run-time type encoding of an 
object and then interpreting it to guide the traversal and marking is functionally equivalent 
to directly traversing it using a compiled marking function that knows the structure of that 
object. 

The general mechanism of type-based function composition (Compile^] Y R r) for an 
arbitrary schema 1Z (such as the compiled marking schema A4') is shown in Figure 8.5. Gen- 
erating compile-time type encodings as shown in Figure 7.3 may be thought of as a special 



165 



case of this mechanism. This mechanism translates a given static type r into a composi- 
tion of schema functions specified by 1Z under a translation environment Tr that maps free 
type variables of r to schema-dependent values. For the case of compiled marking schemar 
(CompilefyV'f'J {on \— > /,-} r) creates a function composition that is capable of marking an 
object whose type is a run-time instance of the static type r. Note that the marking function 
so generated does not contain any type-code interpretation. Its execution directly results into 
the appropriate traversal and marking of the given object. 

The compiled marking process is initiated by converting the reconstructed type-map of each 
activation frame into a composition of compiler-generated marking functions. This translation 
is similar to the type-based function composition shown in Figure 8.5 except that it operates 
on type encodings rather than static types. The resulting function composition may be directly 
applied to the given activation frame to mark all heap objects reachable from that frame. The 
compiled marking schema is currently unimplemented. 

8.4.3 Variations on Marking Schemes 

The interpreted and the compiled marking schemes described above are just a few among a 
full spectrum of possible marking schemes that depend on the degree of type specialization 
performed at compile-time and degree of type interpretation performed at run-time. For in- 
stanceHt is possible to have a marking schema that takes an intermediate position between the 
completely interpreted schema M. and the completely compiled schema M.' . In this schemar 
calls to the top-level interpretive dispatch (Figure 8.4) may be statically specialized to call the 
marking functions of schema M. directlyl" although dynamic type-hints may still have to be 
interpreted at run-time. 

It is also possible to specialize the type-hint propagation and the type reconstruction mech- 
anism described in the last chapter (Section 7.2 and Section 7.3) for the explicit purpose of 
object marking. In this schemer the compiler would insert code to generate and propagate 
type-hints (Section 7.2.3) that consist of compositions of mark functions rather than run-time 
type encodings. The type reconstruction algorithm (Section 7.4.4) would also be modified to 
deal with such type-hints and the algorithm would return a higher-order composition of mark 
functions for the given activation frame rather than a reconstructed type-map. The mark func- 
tion so obtained would be directly applied to the activation frame to mark all heap objects 
accessible from it. 5 

An independent variation for any of the compiled marking schemes is to generate as many 
specialized marking functions as possible at compile-time for every static type occurring in the 
program rather than generating compositions of a fixed set of datatype marking functions as 
shown above. This would clearly reduce the overhead of using higher-order marking functions. 

8.5 *T Implementation 

*T is a paralleir distributed-memory machine with a high performance interconnection net- 
work [NPA92r PBGB93]. The *T architecture extends a basic RISC instruction set with 
low-overheadr user-mode communication and synchronization primitives. The details of the 
architecture may be found elsewhere [Bec92]. In this sectionFwe briefly summarize some of the 



Readers familiar with Haskell's type classes [HWe90, WB89] would immediately recognize that in Haskell, 
we can accommodate all variations of type reconstruction and its applications by declaring a universal class tree 
that provides type encodings, mark functions, print functions etc. as independent methods. 



166 



design features and the terminology of the *T architecture that are relevant to the implemen- 
tation of Id on *T and then describe our implementation of distributed garbage collection on 
this machine. 

8.5.1 Multi-threaded Execution: Processor View 

In our studyrwe used a simulator for the *T architecture based on Motorola's 88110MP pro- 
cessor. The 88110MP is a super-scalar RISC processor extended with an on-chip message and 
synchronization unit (MSU) which provides hardware support for scheduling microthreads . A 
microthread is a compiler-defined sequence of instructions executing within the context of an 
activation frame. A microthread descriptor identifying a microthread consists of an instruction 
pointer (IP) and a frame pointer (FP) (refer Figure 6.2). A microthreadrby definitionFexecutes 
to completion once it has been invoked. It may send messages or fork other microthreads that 
are deposited in a stack of ready-to-run microthreads. 

*T processors communicate with each other by sending messages via the network. Messages 
consist of 4 to 24 32-bit words. Due to the on-chip message unitr*T messages may be dis- 
patched and handled very quickly using the general-purpose processor registers directly (6 and 
12 instructions respectively for a full-sized message). Messages always contain a microthread 
descriptor as the first two words of payload. Normallyl" messages are handled by invoking the 
microthread described within the messageFso these microthreads are termed message handlers. 

A microthread's last operation is to schedule the next microthread of the highest priority 
which is selected from a simple priority queue consisting of handlers of incoming messagesrthe 
microthread stackrand several microthread registers. Message handlers have higher priority 
than computation microthreads. 

8.5.2 Multi-threaded Execution: System View 

*T runs a Unix-like operating system. A parallel job running on *T consists of a separate 
processror a playerTon each processor. Players belonging to the same parallel job are scheduled 
at the same time on their respective processors by the operating system. The players have 
independent 32-bit virtual address spacesrbut may refer to a global 64-bit address space through 
the MSU by sending messages to each other. 

The Id compiler and its run-time system for *T provide the high-level abstraction of a 
singler implicitly parallel program running within a sharedrglobal address space as shown in 
Figure 6.2. The Id compiler statically partitions the user program into several microthreads that 
are scheduled dynamically during execution. Microthreads communicate and synchronize with 
each other via messages. Microthreads belonging to a single Id procedure execute within the 
context of a shared activation frame and may also communicate with each other via the frame. 
Since successively scheduled microthreads on a processor may be completely independentrthe 
general-purpose registers within the processor are kept local to a microthread and are not 
used to communicate data across microthreads. Howe verr registers may still be used to pass 
parameters to C functions called within a single microthread. 

The Id run-time system consists of the frame managerrthe heap managerFand protocol 
handlers for I-structure and M-structure memory operations [CCF + 93]. All run-time system 
calls are initiated and serviced as split-phase transactions. A microthread sends a message to a 
run-time system request handler passing it the descriptor of a microthread that would receive 
the reply. The request handler services the request and returns the result in a message to the 
reply handler provided with the request. This scheme ensures that computation microthreads 



167 



never block the processor pipeline and can always run to completion. 6 This invariant guarantees 
that run-time system exceptions such as running out of frame or heap memory always happen 
at the boundary of a computation microthread. At that momentrnone of the general-purpose 
registers contain any live data and the complete root set of heap objects is available within the 
tree of activation frames. 

The Id run-time system sets up the players participating in a parallel job to continuously 
execute a microthread dispatch loop where microthreads are scheduled according to the priority 
scheme described earlier. One of the players (processor 0) is setup to allocate the root activation 
frame and launch the first microthread along with its user-supplied arguments. It also receives 
the final result and coordinates the termination of the parallel job. 

8.5.3 Memory Organization 

For the purpose of executing Id programsrthe *T machine is logically divided into two 
kinds of nodes: computation nodes and memory nodes (see Figure 8.6). The computation 
nodes manage the dynamic tree of activation frames and execute computation microthreads 
while the memory nodes manage the heap memory and handle various protocols for memory 
references. 

The address space of a player running on a *T processor is divided into several areas that 
are themselves distributed or replicated across the nodes as shown in Figure 8.6. 

The code and static data areas are replicated on all nodes — each node gets a copy of the 
whole program and all of its constants. Each node also has a stack that is used for calling into 
C procedures from Id. The Id run-time system is implemented in C and may also use the C 
stack. 

The frame area on the computation nodes contains the activation frames for every Id pro- 
cedure invocation. When a procedure is invokedrthe run-time system chooses a processor on 
which to allocate its frame according to a built-in load balancing strategy. Thenrthe run-time 
system sends a frame allocation request to that processor in a split-phase transactionrwhich 
allocates a frame in its own frame area and returns a pointer to it to the calling routine. This 
mechanism distributes the dynamic tree of activation frames across all the computation nodes. 

An activation frame is deallocated by the last microthread of its associated procedure and 
may be reused subsequently. In order to avoid confusion due to stale data lying around from 
previous allocationsrthe Id compiler arranges the first microthread of each procedure to clear 
all frame-slots that may contain pointers. This helps in identifying valid data within the frame 
during garbage collection. 

The heap area on the memory nodes contains all of the heap-allocated Id objects. The heap 
area is further divided into the interleaved and the non-interleaved area. The non-interleaved 
area is used for small sized objects contained wholly within the same nodePwhile the interleaved 
area is used to allocate large objects that are spread across all the memory nodes to avoid 
allocation imbalance and reduce memory contention. In order to simplify our studyFwe only 
used the non-interleaved heap area. 

In our implementation of Id on *Trall scalar objects and pointers to heap objects are 64 
bits in size. Furthermorerthese pointers are always aligned on 8-byte boundaries when stored 
in memory. Each 64-bit double word in the heap has an associated 2-bit presence value in the 
presence-bit area. These presence bits are used to implement Id's I-structure [ANP89]Tand 
M-structure [BNA91] synchronization operations. 



If the network is blocked, the message is buffered and is tried again at a later point. Thus, the currently 
executing microthread is guaranteed to terminate without blocking. 



168 



Memory Nodes 



ISO 




IS1 




ISn 


C Stack 




C Stack 


■ ■ ■ 


C Stack 


Presence bits 


Presence bits 


Presence bits 


Non-Interleaved 
Heap 


Non-Interleaved 
Heap 


Non-Interleaved 
Heap 








Interleaved 
Heap 




Interleaved 
Heap 




Interleaved 
Heap 


Code 




Code 




Code 



PEO 



Computation Nodes 

PE1 



PEn 



C Stack 




C Stack 


■ ■ ■ 


C Stack 








Frames 


Frames 


Frames 


Code 




Code 




Code 



Figure 8.6: The Organization of Computation Nodes and Memory Nodes in the *T machine. 

We also use the non-interleaved heap area to keep any deferred-read and locked-take con- 
tinuations for the I-structure and M-structure operations respectively. These continuations 
represent incomplete split-phase memory accesses whose second phase would complete when 
the corresponding heap data becomes available. Thereforerthe heap objects carrying these 
continuations are always considered to be live and should never be garbage collected. On the 
other handr since our system does not perform tail-callsr pointers to activation frames con- 
tained within such continuations are always accessible through the dynamic tree of activation 
frames. Thereforerthese continuations do not have to be scanned for live pointers. Currentlyr 
our run-time system permanently marks such objects as live and manages their allocation and 
deallocation separately. Alsorthe garbage collector treats their contents as scalar data. A 
cleaner solution would have been to designate a separate heap area for allocating such deferred 
continuations so that the garbage collector never sees them. 

Our compiler and run-time system never store a pointer to the interior of an object in a 



169 



frame-slot or another Id object. ThereforeFa pointer found within a frame or a heap object 
always points to the head of the active area of the object. The active area of the object is 
actually preceded in memory by some information managed by the run-time system including 
the object's size (used for deallocation) T a mark-bit (used by the garbage collector) T and the 
time when it was allocated (in instruction cycles — for statistics collection). 

8.5.4 Garbage Collection on *T 

Garbage collection on *T can be initiated either by request from the Id program or by the 
run-time system when one of the processors finds out that it is running out of heap storage. 
Our current policy is to initiate garbage collection when the allocated storage on a node reaches 
a specified fraction (sayr0.75) of its total storage. 

Since the heap is shared globallyFall processors must participate in a global garbage collec- 
tion. Thereforerwhen one processor decides to do garbage collectionFall other processors are 
informed about it. CurrentlylVe have implemented a simple stop-and-collect garbage collection 
scheme. 

Firstrthe computation nodes stop processing computation microthreads and drain all mes- 
sages out of the network because the messages may carry live pointers to heap objects. As 
messages are drained from the networkrtheir handlers are invoked. Our compiler ensures that 
the computation message handlers may modify memory locations or fork other microthreadsr 
but they are not allowed to send more messages. 7 We can handle all messages and eventually 
reach quiescencer as long as we do not run any threads scheduled by the message handlers. 
Since we invoke message handlers as the network drainsr there are no queues of messages to 
consider as part of the root-set during garbage collection. 

Once the network is drainedrall processors synchronize and then initiate the mark phase. 
In this phaseFall live and reachable objects residing on the memory nodes are marked according 
to one of the object identification techniques starting from the distributed tree of activation 
frames residing on the computation nodes. This process requires global communication among 
processors to mark objects distributed across the machine. After global marking is completed 
on all nodesrthe processors synchronize again and then each memory node begins a local sweep 
phase. A final synchronization is performed after sweeping is completed on all nodesFand then 
the Id threads are allowed to resume computation on the computation nodes. 

Type-Reconstructed Garbage Collection 

The mark phase of the Type-Reconstructed Garbage Collection (TRGC) follows the compiler- 
directed object identification scheme described earlier. CurrentlyFwe have only implemented 
the interpreted marking scheme with full type reconstruction as described in Section 8.4.1. 

During the mark phaserthe frame memory of each computation node is traversed locally 
to find the activation frames that belong to the current dynamic activation tree. Each activa- 
tion frame that is currently in use is type-reconstructed according to the algorithm shown in 
Figure 7.5. Since the dynamic activation tree is distributed across processorsrthis process may 
require sending messages to non-local parent activation frames in order to obtain their use-type 
instantiations. 

Once a frame is reconstructedrits slots are searched for heap objects to be marked using their 
fully reconstructed types. We directly follow the type-code interpretation scheme of Figure 8.4 



The run-time system message handlers are still allowed to send reply messages, but the number of such 
messages is fixed. 



170 



by examining the type constructor for the current frame-slot to see if it refers to a structured 
datatype. If sorthe value in the frame-slot is parsed as a pointer and a request for marking the 
corresponding object is sent to its home node along with its fully reconstructed type packed 
within the requesting message. At the home noderthe object and its contents are marked 
according to the marking schema shown in Figure 8.3. 

Note thatralthough type reconstruction of a frame must precede marking within that framer 
it may be overlapped with type reconstruction or marking of other frames or heap objects. 

Conservative Garbage Collection 

The mark phase of the Conservative Garbage Collection (CGC) requires no source type infor- 
mation. Conservative garbage collectors use a simpler conservative test to determine whether 
a value in a frame or a heap object is a pointer to another object. Since pointers are identified 
conservativelyrCGC may assume that there are live references to an object when there are noner 
therefore some objects may remain uncollected. AlsorCGC cannot compact or copy all objects 
because conservatively identified pointers cannot be updated. Howeverrthere are some more 
sophisticated schemes that allow compaction and/or copying of a large fraction of the heap 
objects [Bar88]. FinallyrCGC has no knowledge of the source typesrtherefore it must examine 
every slot of every reachable object and no short-circuiting based on scalar-type information is 
possible. 

As in the case of TRGCrthe mark phase of CGC begins on the computation nodes by 
traversing their frame memory and identifying the activation frames currently in use. For each 
activation frame in user we apply the conservative pointer test on each of its frame-slots as 
follows: 

1. Firstrwe check to see if the 64-bit value contained within the frame slot is non-zero and 
is aligned to a 64-bit boundary. If notrthen the value is a scalar. 

2. Nextrwe parse the value as a potential global pointer and determine its home node. If 
the node address falls outside the known range of addresses for memory nodesrthe value 
is a scalar. 

3. Finallyrwe send a message to the home node to check if the value is a valid pointer. At 
the home nodeFwe test whether the value points within the allocated heap area and that 
it points to the head of an actual heap object. The latter test is made possible because 
the run-time system marks the head of each allocated object with a special presence-bit 
pattern. Furthermorerthe system guarantees that actual pointers never point to the 
interior of objects. Thereforerthis test may be carried out by simply checking for the 
special presence-bit pattern at the head of the pointer value. If this test succeeds then 
the value is considered to be an actual pointer and the object is markedrotherwise the 
value is taken to be a scalar. 

The test may mark some objects that are not actually reachable because a value in memory 
happens to look like a pointer to that object. Howeverrthe test is guaranteed to mark only 
actual heap objects because it checks for the special allocation presence-bit pattern. 

Once a value has been determined to be a pointerrthe fields of the object it points to are 
scanned for potential references to other objects in a similar fashion. 



171 



Compiler-Directed Storage Reclamation 

For comparison purposesr we have also implemented the explicitr compiler-directed storage 
reclamation scheme (CDSR) within the same compiler and run-time system framework. In 
this schemer no separate garbage collection needs to be performed: the compiler inserts code 
to deallocate an object when it can determine the object to be garbage. This analysis has a 
substantial compile-time cost compared to the other two storage management schemes. Alsor 
the static analysis may not be able to reclaim all the garbage that is generated by the program. 

The run-time costs of this scheme may be divided into a small synchronization cost that 
schedules the deallocation of an object when all its references are dead and the actual cost of 
deallocating the object. The former cost is negligible and is also hard to separate from the user 
program because it is built into the microthread partitioning and synchronization of the user 
program. The second cost is the same as the basic cost of sweeping the unused objects as in the 
other garbage collection schemes and therefore forms the basis of our comparison with those 
schemes. 

We use the CDSR scheme to compare its relative storage management efficiency to that 
of the garbage collected schemes. It is also possible to simultaneously use the explicit storage 
management scheme to get most of the large objects along with a garbage collector that catches 
the smallerr harder to analyze objects. We believe that a mixed approach may yield better 
performance than either scheme on its own. 

8.6 Performance Results and Analysis 

We are interested in two aspects of the performance of the type-reconstructed garbage collection 
(TRGC): how long it takes to garbage collectrand how much garbage it reclaims. We compared 
several programs running with TRGCrconservative garbage collection (CGC)Tand compiler- 
directed storage reclamation (CDSR). 

In preparing a uniform execution platformFwe naturally had to accommodate the require- 
ments of each storage management scheme within the same run-time system. This resulted in 
a system that was not tuned to any particular storage management scheme. For instanced 
copying or compacting garbage collector could not be used for TRGC since our simple-minded 
scheme for conservative garbage collection would not work in that setup. Similarlyrthe run- 
time system had to maintain free-lists for reclaimed objects since we wanted to perform explicit 
storage management within the same framework. 

Thusrthe results we obtained cannot be treated as an absolute measure of performance for 
any particular scheme. On the other handrthey provide a good measure of relative performance 
of the object identification mechanisms studied and also characterize systems where more than 
one storage management strategy is used. 

8.6.1 Benchmark Runs 

We used four different benchmarks. Quicksort is the standard recursive algorithm for sorting 
N list elements parameterized by a polymorphic comparison predicate. Paraffins generates and 
counts the number of distinct paraffin isomers of up to N carbon atoms. Gamteb is a Monte 
Carlo simulation of N photons impinging on a carbon rod divided into two cells. Finallyl" 
Wavefront consists of 10 iterations of a successive over-relaxation kernel of a N X N matrix 
containing floating-point data. 



172 



Quicksort 


Instruction Cycles (xlOOO) 


Mode 


Input 

N 


Heap 

(Wds) 


GCs 


Id 


Id RTS 


Idle 


Total 


Basic 


Mark 


Sweep 


TREC 


Total 


TRGC 


25 


5628 


2 


488 


209 


109 


17 


22 


372 


3 


863 


CGC 


25 


5640 


2 


488 


208 


137 


16 





367 


5 


860 


CDSR 


25 


5236 





519 


193 











195 


7 


721 


TRGC 


50 


8640 


2 


1121 


513 


201 


30 


43 


812 


8 


1942 


CGC 


50 


8628 


2 


1113 


497 


179 


30 





714 


5 


1833 


CDSR 


50 


10936 





1185 


466 











469 


7 


1661 


TRGC 


75 


15000 


2 


1736 


810 


492 


51 


129 


1549 


7 


3293 


CGC 


75 


15004 


2 


1717 


783 


179 


48 





1019 


7 


2743 


CDSR 


75 


17328 





1852 


747 











748 


11 


2611 


TRGC 


100 


18752 


2 


2348 


1106 


436 


62 


95 


1747 


8 


4103 


CGC 


100 


18756 


2 


2309 


1057 


414 


63 





1548 


11 


3868 


CDSR 


100 


25272 





2490 


1012 











1013 


14 


3517 



Paraffins 


Instruction Cycles (xlOOO) 


Mode 


Input 

N 


Heap 

(Wds) 


GCs 


Id 


Id RTS 


Idle 


Total 


Basic 


Mark 


Sweep 


TREC 


Total 


TRGC 


10 


8870 


2 


678 


352 


123 


30 


11 


528 


20 


1225 


CGC 


10 


8870 


2 


681 


354 


89 


30 





481 


16 


1177 


CDSR 


10 


10690 





700 


308 











311 


40 


1051 


TRGC 


11 


15760 


2 


963 


538 


286 


52 


19 


905 


40 


1908 


CGC 


11 


15760 


2 


964 


538 


185 


52 





782 


38 


1784 


CDSR 


11 


17572 





1000 


462 











465 


87 


1553 


TRGC 


12 


28144 


3 


1482 


900 


620 


93 


38 


1660 


45 


3187 


CGC 


12 


28148 


3 


1487 


902 


387 


93 





1389 


43 


2920 


CDSR 


12 


30722 





1523 


749 











752 


107 


2383 


TRGC 


13 


46884 


3 


2521 


1607 


2765 


234 


145 


4763 


121 


7405 


CGC 


13 


46884 


3 


2528 


1608 


1726 


234 





3576 


114 


6218 


CDSR 


13 


58682 





2566 


1299 











1302 


292 


4160 



Figure 8.7: Performance Results for Quicksort and Paraffins. 



For each of the programs we testedlVe ran three versions: TRGCrCGCrand CDSR. The 
TRGC version is the program running with type-reconstructing garbage collection. The CGC 
version is running with conservative garbage collectionl" and the CDSR is the automatically 
annotated version running with no garbage collection. Both garbage collectors used the mark 
and sweep algorithmFand used the same implementation of sweeping and inter-processor syn- 
chronization. Using a simple GC algorithm allowed us to separate the basic heap management 
cost (allocation and deallocation) from the overall cost of garbage collection. Thusrthe cost 
of object traversal and marking of TRGC and CGC can be truly ascribed to their respective 
object identification strategies. 

In all three casesFactual heap storage management and statistics collection was performed by 
the same Id run-time system. Although statistics gathering was mildly intrusiverit constituted 
a tiny fraction of total cycles executed. Online statistics processing (re-sampling profiles) was 
not counted. 



173 



Gamteb 






Instruction Cycles 


(xlOOO) 






Mode 


Input 

N 


Heap 

(Wds) 


GCs 


Id 


Id RTS 


Idle 


Total 


Basic 


Mark 


Sweep TREC Total 


TRGC 


25 


5634 


3 


1948 


289 


59 


15 


6 381 


65 


2394 


CGC 


25 


5634 


3 


1950 


291 


102 


15 


417 


58 


2425 


CDSR 


25 


1780 





2000 


313 








316 


56 


2371 


TRGC 


50 


11278 


3 


3837 


586 


73 


30 


8 709 


123 


4668 


CGC 


50 


11278 


3 


3824 


584 


117 


30 


739 


119 


4682 


CDSR 


50 


1780 





3929 


627 








631 


111 


4671 


TRGC 


75 


16812 


2 


5485 


836 


35 


34 


4 919 


191 


6594 


CGC 


75 


16812 


2 


5490 


840 


62 


34 


942 


172 


6604 


CDSR 


75 


1712 





5628 


906 








913 


173 


6714 


TRGC 


100 


22506 


2 


7150 


1096 


70 


46 


7 1228 


246 


8624 


CGC 


100 


22506 


2 


7159 


1101 


97 


45 


1250 


227 


8636 


CDSR 


100 


1840 





7355 


1191 








1198 


222 


8775 




Wavefront 






Instruction Cycles 


(xlOOO) 






Mode 


Input 

N 


Heap 

(Wds) 


GCs 


Id 


Id RTS 


Idle 


Total 


Basic 


Mark 


Sweep TREC Total 


TRGC 


10 


1726 


3 


495 


22 


21 


1 


2 56 


50 


601 


CGC 


10 


1726 


3 


495 


22 


49 


1 


79 


50 


624 


CDSR 


10 


1078 





518 


21 








23 


48 


589 


TRGC 


20 


5256 


5 


1772 


41 


41 


2 


3 100 


224 


2096 


CGC 


20 


5316 


5 


1769 


41 


242 


2 


295 


224 


2288 


CDSR 


20 


1300 





1821 


40 








42 


211 


2074 


TRGC 


30 


10500 


5 


3922 


65 


41 


2 


3 124 


523 


4569 


CGC 


30 


10500 


5 


3921 


65 


462 


2 


540 


524 


4985 


CDSR 


30 


6548 





4064 


64 








66 


494 


4624 


TRGC 


40 


20680 


5 


6946 


113 


41 


2 


3 172 


945 


8064 


CGC 


40 


20740 


5 


6934 


113 


832 


2 


957 


947 


8839 


CDSR 


40 


12692 





7191 


112 








114 


893 


8198 




I 


7 igure 8.!* 


I: Perfc 


rmance 


Results 


for Gamteb and Wavefront. 







We simulated several problem sizes on a single processor with each program and storage 
management scheme. Figure 8.7 and Figure 8.8 show the performance results for each of the 
benchmarks. The first two columns identify the storage management scheme (Mode) and the 
input size (N). The next two columns show the maximum heap size used (Heap) during each run 
measured in 32-bit wordsFand the number of garbage collections performed (GCs). Subsequent 
columns record timing information for various categories of instructions measured in Kcycles. 
In each of the garbage collected runsrthe run-time system initiated the garbage collection when 
the currently allocated space exceeded 75% of the total heap space. Garbage collection was 
switched off for CDSR runs. 

The timing information for each benchmark run is broken up into several categories. The 
amount of time spent in Id computation threads (Id) includes basic computation workrmath- 
library subroutine callsr split-phase memory referencing and program I/O. The time spent in 
the run-time system (Id RTS) is classified into the time spent in basic storage management 



174 



10000 



o 
o 
o 



8000 



6000 



-= 4000 



O 



2000 



10000 



Quicksort 



TRGC Total 
CGC Total 
CDSR Total 
TRGC RTS 
CGC RTS 
CDSR RTS 




10000 



List Size 
Gamteb 



o 
o 



o 
O 



8000 



6000 



4000 



2000 



O 
O 
O 



8000 



6000 



-= 4000 



O 



2000 




10000 



o 
o 
o 

5_ 

</> 
a> 

o 
O 



8000 



6000 



4000 



2000 



Paraffins 




Particles 



Matrix Size 



Figure 8.9: Total Cost and Run-time System Cost for the Benchmarks. 

(allocation/deallocation) Tframe and object marking during garbage collectionsr object sweep- 
ingrand type reconstruction. The remaining time is spent idling through the scheduling loop 
waiting for messages to arrive through the network. 8 

8.6.2 Performance Analysis 
Time Analysis 

The total instruction cycles and the cycles spent in the run-time system (including garbage 
collection) for all the runs are summarized in Figure 8.9. These curves give an idea of the 
growth of run-time system cost of the various schemes as a function of problem size and as a 
fraction of the total cost. 

Several trends are apparent from Figure 8.9. The CDSR scheme consistently has the lowest 
run-time cost since it does not perform any garbage collection and only incurs the basic heap 
and frame management cost (allocation and deallocation). The fraction of time spent in the 



Even if only a single processor is used out of a multi-processor *T configuration, all messages are sent out 
to the network and received after some delay. This may cause idle cycles on the processor if it does not have 
anything else to do. 



175 



5000 



4000 



g 3000 



£2000 

o 



1000 







ii 



□ Type-Rec. 
I I Sweep 

^ Mark-Object 

□ Mark-Frame 
■ Heap-Mgmt. 

□ Frame-Mgmt. 




Ill 




o 

u 

DC 



o 
o 
o 



DC 


O 


O 


DC 


O 


o 


DC 


O 


o 


DC 


CD 


O 


C5 


CD 


O 


n 


CD 


O 


n 


W 


Q 


DC 


o 


Q 


DC 


o 


Q 


DC 


o 


Q 


O 


h- 




O 


h- 




O 


h- 




(J 



Quicksort 100 Paraffins 13 Gamteb 100 Wavefront 40x40 



Figure 8.10: Run-time System Cost Breakup. 

run-time system varies widely depending upon the nature of the application and the cost and 
the number of garbage collections performed. For exampler Paraffins allocates a lot of small- 
sized data-structures keeping them live until the very end. ThusFeach mark phase has to do a 
lot of work. Similarlyr Quicksort rapidly unfolds into a tree of activation frames each of which 
holds onto a substantial amount of storageFso the cost of marking is high there as well. On 
the other handrfor Gamtebrthe size of the live heap is quite small so the garbage collected 
schemes incur very little overhead. 

Comparing the relative run-time costs of TRGC and the CGCrwe find that for Quicksort 
and ParaffinsrTRGC does worse than CGCrwhile for Wavefront TRGC performs better. This 
wide variation can be explained by examining the run-time cost breakup shown in Figure 8.10 
for the largest sized runs. We split the basic storage management cost shown in Figure 8.7 and 
Figure 8.8 between the cost of managing the frame area and the cost of managing the heap. 
The marking cost is similarly split between the cost of marking the frames and the cost of 
marking the live heap objects. 

Looking at Figure 8.10FTRGC spends a significant amount of time in the type reconstruc- 



176 



tion phase for both Quicksort and Paraffins. This is because both these benchmarks contain 
several polymorphic functions. Thusrthe type reconstruction mechanism has to generate and 
propagate the exact run-time type instantiation down from the root to each polymorphic frame 
in the dynamic call tree. On the other handrthe type reconstruction cost is hardly visible in 
Gamteb and Wavefront that are not polymorphic and largely consist of first-order functions. 
Furthermorerduring type reconstruction and interpreted markingrthe run-time types are rep- 
resented as C data-structures and are currently managed using conventional malloc and free 
system calls. This cost can be substantially reduced by using a specialized version of malloc. 

The marking cost of TRGC is also about 1.5-2.2 times higher than that of CGC in case 
of Quicksort of 100 elements and Paraffins of 13 carbon atoms. Our current implementation 
interprets the type structures at run-time in order to traverse and mark the corresponding run- 
time objects. This interpretation overhead could be eliminated by using the compiled marking 
schema as described in Section 8.4.2 where the compiler generates a specialized marking routine 
for each source type parameterized over its polymorphic variables. Furthermorerthese routines 
can be inlined to produce highly optimized traversal and marking functions for each user-defined 
function activation frame. 

In the case of Wavefrontr TRGC takes much less time than CGCrand very little more 
time in total than CDSRTwhere no marking at all took place. For Wavefront of 40 X 40rthe 
marking cost of CGC is 25 times higher than that of TRGC. TRGC did so well because it 
could determine that the arrays contained only scalar data by inspecting their run-time type. 
Thereforerit only marked the arrays themselves and did not scan for pointers inside themFas 
CGC did. This scanning cost depends on the total size of the arrays and was responsible for 
the quadratic growth in run-time cost for CGC as shown in Figure 8.9. HoweverFsweeping took 
the same amount of time for both TRGC and CGC. 

The wavefront example shows that in an ideal situationrthe time to mark the heap for TRGC 
is proportional to the total number of live object referencesFrather than the total amount of live 
storage as it is for CGC. TRGC can use the reconstructed type information to avoid scanning 
elements of scalar arrays and scalar fields within records and algebraic types. 

Space Analysis 

In terms of space usagerboth TRGC and CGC perform identically. As shown in Figure 8.7 and 
Figure 8.8rboth TRGC and CGC perform the same number of garbage collections in all runs 
and use roughly the same amount of heap storage. Both TRGC and CGC runs were provided 
with the same amount of initial storage. Althoughrthe size of the initial storage was kept 
sufficiently large to avoid thrashing. This accounts for the small number of garbage collections 
performed. 

Each garbage collected run also performed a final GC at the end of the run to reclaim all the 
uncollected garbage. Due to this final garbage collectionrthe TRGC and CGC runs actually 
reclaimed more storage than the CDSR runsrbecause the compiler could not insert deallocation 
commands for all of the temporary storage. 

CGC is able to reclaim all the garbage because of our restrictive compilation model and 
support from the run-time system. As mentioned earlierrin our system all actual pointers 
directly point to the head of a heap object. This not only reduces the overhead of guessing 
whether a given value is a valid heap pointer or not but also avoids creating many more am- 
biguous pointers for the garbage collector to check for. The run-time system further eliminates 
the chances of making the wrong guess by marking the head of every object with a special 
bit-pattern. 



177 



The performance of CDSR varies with the application. For Gamteb and WavefrontrCDSR 
is able to insert deallocation commands to reclaim all the garbage automatically. Thereforer 
these benchmarks are able to run under CDSR without leaking any storage. The garbage 
collected versions for these benchmarks had to be given 2-10 times the storage used by CDSR 
to avoid thrashing. On the other handrfor Paraffins and QuicksortrCDSR is able to reclaim 
only 10-20% of the total garbager therefore the TRGC and CGC versions are able to run in 
same or less storage than the CDSR version without thrashing. This shows that in generair 
CDSR may need additional storage reclamation support from an independent garbage collectorr 
although it works very efficiently for applications where data-structures are easily analyzed. 

8.7 Conclusions 

In this chapterrwe have described a direct application of complete run-time type reconstructionr 
namelyrtagless garbage collection (TRGC). We used the reconstruction algorithm described in 
Chapter 7 to reconstruct the exact types of all run-time objects. We also described an inter- 
preted and a compiled marking schema for traversing and marking live run-time objects using 
the reconstructed type information. We have implemented the interpreted marking schema on 
a simulator for the *T architecture and compared its performance with conservative garbage 
collection (CGC) and compiler-directed storage reclamation (CDSR) on several benchmarks. 

Our results show that in generairTRGC does more work in marking the live objects than 
CGCr unless it can avoid scanning larger scalarr array-like objects using type information. 
The type reconstruction overhead increases with the amount of polymorphism and higher- 
order functions (closures) used in the programr although the cost of reconstruction is small 
compared to the cost of marking live objects with type interpretation. The cost of interpreted 
marking itself should get reduced considerably using the compiled marking schema instead of 
type interpretation. 

TRGC has the additional advantage that other storage reclamation schemes may be usedr 
such as compaction or copying. These may not be used with CGC because they require updating 
live pointersrand CGC cannot guarantee that what it uses as a pointer is not really a scalar 
value. On the other handrTRGC requires initialization of polymorphic and pointer data with 
valid values and cannot cope with stale data as CGC can. 

CDSR consistently does better than either of the garbage collection schemes in terms of 
time spent in the run-time system. This is as expectedr although sometimes it is not able to 
collect all the garbage and therefore requires more memory than strictly necessary. CDSR also 
takes much longer to compileFsometimes increasing compile-time by a factor of 10. 

On the wholertype reconstruction and type-reconstruction-based garbage collection seem 
to be a promising area of research with a lot of scope for compiler optimization and run-time 
performance improvement. This initial study has shown that type reconstruction based garbage 
collection is certainly feasible and can be competitive with other storage management strategies 
under the right mix of applications. 

8.7.1 Future Work 

There are several dimensions in which further investigation would be useful. The first step would 
be to implement the compiled marking schema and compare its performance with our current 
interpreted marking schema. We expect to see a substantial improvement in performance using 
specialized marking functions. Our experience also shows that mixed storage management 
schemes that combine garbage collection with explicit storage reclamation within the same 



178 



run-time environment are feasible and may be able to combine the benefits of both schemes 
running on its own. 

Although our system has been designed and implemented for a multi-processor architecturer 
we have currently made a study for only a single processor. We would like to see how TRGC 
scales under a multi-processor environment and quantify the inter-processor communication 
overhead for type reconstruction. 

It would be very interesting to compare the performance of TRGC with an explicitly tagged 
object identification scheme implemented within the same framework. It would be interesting 
to know if TRGC offers any concrete advantages over that technique. 

FinallyT it would be useful to implement a compacting garbage collector based on type 
reconstruction with a very simple allocation scheme (bumping a pointer) and compare its heap 
management overhead with that of the CGC and CDSR that require a more sophisticated 
storage management scheme (free-lists). 



179 



180 



Bibliography 



[AA91] Zena M. Ariola and Arvind. Compilation of Id. In Proceedings of the fourth Work- 
shop on Languages and Compilers for Parallel Computing, Santa Clara, California^ 
August 1991. Also available as CSG Memo 341LMIT Lab. for Computer Sc.LCam- 
bridgerMA 02139. 

[AA93] Zena M. Ariola and Arvind. Graph Rewriting Systems: Capturing Sharing of Com- 
putation in Language Implementations. Computation Structures Group Memo 347L 
MIT Laboratory for Computer ScienceL545 Technology SquareLCambridgeLMA 
02139LApril 1993. 

[AA94] Zena M. Ariola and Arvind. Properties of a First-order Functional Language with 
Sharing. CSG Memo 347-lLLaboratory for Computer ScienceLMITLCambridgeL 
MA 02139LJune 1994. To appear in Theoretical Computer ScienceL September 
1995. 

[AC93] Shail Aditya and Alejandro Caro. Compiler-directed Type Reconstruction for Poly- 
morphic Languages. In Proceedings of the ACM Conference on Functional Program- 
ming Languages and Computer Architecture, Copenhagen, DenmarkT pages 74-82L 
June 1993. 

[AFH94] Shail AdityaL Christine H. FloodL and James E. Hicks. Garbage Collection for 
Strongly-Typed Languages using Run-time Type Reconstruction. In Proceedings of 
the ACM Conference on Lisp and Functional Programming, Orlando, Florida, US AT 
pages 12-23. ACM PressLJune 1994. 

[AHU74] Alfred V. AhoLJohn E. HopcroftLand Jeffrey D. Ullman. The Design and Analysis 
of Computer Algorithms. Addison- WesleyL 1974. 

[AM89] Andrew W. Appel and David B. MacQueen. Standard ML Reference Manual. 
Princeton University and AT&T Bell LaboratoriesLPreliminary editionL1989. Dis- 
tributed along with the Standard ML of New Jersey Compiler. 

[ANP89] ArvindLRishiyur S. NikhilLand Keshav K. Pingali. I-Structures: Data Structures for 
Parallel Computing. ACM Transactions on Programming Languages and SystemsT 
11(4):598-632L1989. 

[App89] Andrew W. Appel. Runtime tags aren't necessary. Lisp and Symbolic ComputationT 
2(2):153-163LJune 1989. 

[App90] Andrew W. Appel. A runtime system. Lisp and Symbolic ComputationT 3(4) :343- 
380LNovember 1990. 



181 



[Bar92] 



[Bec92] 



[Blo89] 



[BNA91] 



[Bar88] Joel F. Bartlett. Compacting Garbage Collection with Ambiguous Roots. Research 
Report 88/2rWestern Research LaboratoryrDigital Equipment CorporationLFebru- 

ary 1988. 

Paul S. Barth. Atomic Data Structures for Parallel Computing. PhD thesisLLabo- 
ratory for Computer SciencerMITTCambridgerMA 02139rMarch 1992. Available 
as Technical Report MIT/LCS/TR-532. 

Michael J. Beckerle. An Overview of the START(*T) Computer System. Mo- 
torola Technical Report MCRC-TR-28rMotorola Cambridge Research CenterrOne 
Kendall SquarerBuilding 200rCambridgerMA 02139rjuly 1992. 

A. Bloss. Update analysis and the efficient implementation of functional aggregates. 
In Proceedings of the ACM Conference on Functional Programming Languages and 
Computer Architecture, London, UK. ACMrSeptember 1989. 

Paul S. BarthrRishiyur S. Nikhiirand Arvind. M-Structures: Extending a Paralleir 
Non-StrictFFunctional Language with State. In Proceedings of the ACM Comference 
on Functional Programming Languages and Computer ArchitectureTp&ges 538-568. 
Springer-Verlagri991. LNCS 523. 

[Bur77] Rod M. Burstall. Design Considerations for a Functional Programming Language. 
In Infotech State of the Art Conference: The Software RevolutionTOctober 1977. 

[BW88] H.-J. Boehm and M. Weiser. Garbage Collection in an Uncooperative Environment. 
Software — Practice and ExperienceT18 (9) :807-820LSeptember 1988. 

[Car89] L. Cardelli. Typeful Programming. In E. J. Neuhold and M. PaulLeditorsLForma/ 
Description of Programming ConceptsTp&ges 431-507. Springer- VerlagL1989. 

[Car93] Alejandro Caro. A Debugger for Id. Master's thesisL Massachusetts Institute of 
TechnologyLFebruary 1993. 

[CCF + 93] Derek ChiouL Alejandro CaroL Christine FloodL James E. HicksL and Michael J. 
Beckerle. Run time support for Id running on *TLversion 1.4. Computation struc- 
tures group memoLMITLLaboratory for Computer ScienceL545 Technology SquareL 
CambridgeLMA 02139LMay 1993. 

[Dam85] L. Damas. Type Assignment in Programming Languages. PhD thesisLUniversity of 
EdinburghLDepartment of Computer ScienceL1985. 

[Dar77] John Darlington. Program Transformation and Synthesis: Present Capabilities. 
Research Report 77/43LDepartment of Computing and ControlL Imperial College 
of Science and TechnologyLLondonL September 1977. Also appears as Report No. 
48LDepartment of Artificial IntelligenceL University of Edinburgh. 

[DM82] L. Damas and R. Milner. Principal Type-schemes for Functional Programs. In 
Proceedings of the 9th Symposium on Principles of Programming LanguagesTp&ges 
207-212LJanuary 1982. 

[FLR + 94] Babak FalsafiLAlvin R. LebeckL Steven K. ReinhardtLIoannis SchoinasLMark D. 
HillLJames R. LarusLAnne RogersLand David A. Wood. Application-Specific Pro- 
tocols for User-Level Shared Memory. In Super-computing '94, Proceedings. IEEE 
Computer Society PressLNovember 1994. 



182 



[GG92] Benjamin Goldberg and Michael Gloger. Polymorphic Type Reconstruction for 
Garbage Collection without Tags. In Proceedings of the ACM Conference on Lisp 
and Functional ProgrammingTp&ges 53-65IT992. 

[GJLS87] David K. Giffordr Pierre Jouvelotrjohn M. Lucassenrand Mark A. Sheldon. FX- 
87 Reference Manual. Technical Report MIT/LCS/TR-407rMIT Laboratory for 
Computer SciencerSeptember 1987. 

[GJS091] David K. GiffordrPierre JouvelotrMark A. Sheldonrand James W. O'Toole. Report 
on the FX-91 Programming Language. Technical Report MIT/LCS/TR-531LMIT 
Laboratory for Computer ScienceL1991. 

[Gol91] Benjamin Goldberg. Tag-free garbage collection for strongly typed programming 
languages. In SIGPLAN '91 Conference on Programming Language Design and 
ImplementationTp&ges 165-176LJune 1991. 

[GP90] Benjamin Goldberg and Young Gil Park. Higher Order Escape Analysis: Optimizing 
Stack Allocation in Functional Program Implementations. In Proceedings of the 
3rd European Symposium on ProgrammingT pages 152-160. Springer- VerlagL 1990. 
LNCS 432. 

[GPG91] Young Gil Park and Benjamin Goldberg. Reference Escape Analysis: Optimizing 
Reference Counting based on the Lifetime of References. In Proceedings of the ACM 
Symposium on Partial Evaluation and Semantics-based Program Manipulation, Yale 
University, New Haven, CT, USATp&ges 178-189. ACM PressLJune 1991. 

[Gup90] Shail Aditya Gupta. An Incremental Type Inference System for the Programming 
Language Id. Master's thesisLLaboratory for Computer ScienceLMITLCambridgeL 
MA 02139LSeptember 1990. Available as Technical Report MIT/LCS/TR-488. 

[HI89] W.L. Harrison III. The interprocedural analysis and automatic parallelization of 

scheme programs. Lisp and Symbolic ComputationT2(3-4:) A79-396T1989. 

[Hic93] James E. Hicks. Experiences with compiler-directed storage reclamation. In Con- 
ference on Functional Programming Languages and Computer Architecture! 1993. 

[HJ92] James E. Hicks Jr. Compiler-directed Storage Reclamation using Object Lifetime 

Analysis. PhD thesisLLaboratory for Computer ScienceL MITL CambridgeL MA 
02139L1992. Available as Technical Report MIT/LCS/TR-555. 

[HM93] R. Harper and J. C. Mitchell. On the Type Structure of Standard ML. ACM 
Transactions on Programming Languages and 5 , t/stemsL15:211-252LApril 1993. 

[HMV93] My HoangLJohn MitchellLand Ramesh Viswanathan. Standard ML weak polymor- 
phism and imperative constructs. In Proceedings of the Eighth Annual Symposium 
on Logic in Computer ScienceTp&ges 15-25. ACM PressLJune 1993. 

[Hud92] Paul Hudak. Mutable Abstract Datatypes -or- How to Have Your State and Munge 
It Too. Research Report YALEU/DCS/RR-914LDepartment of Computer ScienceL 
Yale UniversityLNew HavenLCT 06520LDecember 1992. Revised May 1993. 



183 



[HWe90] P. Hudak and P. Wadler (editors). Report on the programming language 
Haskelir a non-strict purely functional language (Version 1.0). Technical Report 
YALEU/DCS/RR777r Yale Universityr Department of Computer Sciencer April 
1990. 

[JG91] Pierre Jouvelot and David K. Gifford. Algebraic Reconstruction of Types and Ef- 

fects. In Proceedings of the 1991 ACM Conference on Principles of Programming 
LanguagesTp&ges 303-310. ACML1991. 

[Joh85] Thomas Johnsson. Lambda lifting: Transforming programs to recursive equations. 
In Springer- Verlag LNCS 201 (Proc. Functional Programming Languages and Com- 
puter Architecture, Nancy, France^LSeptember 1985. 

[JW75] Kathleen Jensen and Niklaus Wirth. PASCAL User Manual and Report. Springer- 
Verlagri975. 

[KR88] Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language. Pren- 
tice Halirsecond editionri988. 

[LAB + 81] Barbara Liskovr Russell AtkinsonL Toby BloomL Eliot Mossrj. Craig Schaffertr 
Robert Scheiflerr and Alan Snyder. CLU Reference ManualT volume 114 of Lec- 
ture Note in Computer Science. Springer- Verlagri981. 

[Ler92] Xavier Leroy. Polymorphic Typing of an Algorithmic Language. Rapports de 
Recherche 1778LINRIALRocquencourtLFranceLOctober 1992. English translation 
of the author's Ph.D. thesis originally in French. 

[Ler93] Xavier Leroy. Polymorphism by name for references and continuations. In Proceed- 
ings of the ACM Symposium on Principles of Programming Languages. ACM PressL 
1993. 

[LG88] John M. Lucassen and David K. Gifford. Polymorphic Effect Systems. In Proceed- 
ings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of 
Programming languages, San Diego, CaliforniaTp&ges 47-57LJanuary 1988. 

[LPJ94] John Launchbury and Simon L. Peyton Jones. Lazy Functional State Threads. In 
Proceedings of the ACM SIGPLAN Conference on Programming Language Design 
and Implementation, Orlando, Florida, USA. ACMLJune 1994. 

[Luc87] John M. Lucassen. Types and Effects - Towards the Integration of Functional and 
Imperative Programming. PhD thesisLLaboratory for Computer ScienceLMITLCam- 
bridgeLMA 02139LAugust 1987. Available as Technical Report MIT/LCS/TR-408. 

[LW91] Xavier Leroy and Pierre Weis. Polymorphic type inference and assignment. In 
Proceedings of the A CM Symposium on Principles of Programming LanguagesTp&ges 
291-302. ACMLJanuary 1991. 

[Mai90] Harry G. Mairson. Deciding ML Typability is Complete for Deterministic Exponen- 
tial Time. In Proceedings of the 17th ACM Symposium on Principles of Program- 
ming LanguagesTp&ges 382-401LJanuary 1990. 

[Mil78] Robin Milner. A theory of type polymorphism in programming. Journal of Com- 
puter and System 5oencesL17:348-375L1978. 



184 



[MT91] 
[MTH90] 

[NAH93] 
[Nik91] 

[Nik94] 



[NPA92] 



[0J91] 



Robin Milner and Mads Tofte. Commentary on Standard ML. The MIT Pressr 
CambridgerMassachusettsIT991. 

Robin MilneiTMads TofteFand Robert Harper. The Definition of Standard ML. 
The MIT PressrCambridgerMassachusettsIT990. 

Rishiyur S. NikhiirArvindrand James Hicks. pH Language Proposal (Preliminary). 
Circulated on the pH mailing listrSeptember 1993. 

Rishiyur S. Nikhil. Id Language Reference Manual Version 90.1. Technical Report 
CSG Memo 284-2FMIT Laboratory for Computer ScienceF545 Technology SquareF 
CambridgeFMA 02139FJuly 15 1991. 

Rishiyur S. Nikhil. Cid: A ParallelF "Shared-memory" C for Distributed-memory 
Machines. In Proceedings of the 7th Annual Workshop on Languages and Compilers 
for Parallel Computing, Ithaca, NY. Cornell Theory CenterF Cornell Universityr 
August 1994. 

Rishiyur S. Nikhiir Gregory M. PapadopoulosFand Arvind. *T: A Multithreaded 
Massively Parallel Architecture. In Proceedings of the 19th International Symposium 
on Computer Architecture, Queensland, Australia. ACM PressFMay 1992. 

James William O'Toole Jr. Type Abstraction Rules for References: A comparison of 
four which have achieved notoriety. Technical Memo MIT/LCS/TM-390rLabora- 
tory for Computer ScienceFMITr545 Technology SquareFCambridgeFMassachusetts 
02139rAugust 1991. 

[PBGB93] G. M. PapadopoulosFG. A. BoughtonFR. GreinerFand M. J. Beckerle. *T: In- 
tegrated building blocks for parallel computing. In Proceedings of Super-computing 
'&3T1993. 

[PJ92] Simon L. Peyton Jones. Implementing lazy functional languages on stock hardware: 

the Spineless Tagless G-machine. Journal of Functional ProgrammingT2(2) :127-202T 
April 1992. 

[PJW92] Simon L. Peyton Jones and Philip Wadler. A static semantics for HaskelirFebruary 
1992. 

[PJW93] Simon L. Peyton Jones and Philip Wadler. Imperative Functional Programming. In 
Proceedings of the 20th A CM Symposium on Principles of Programming Languages, 
Charleston, South Carolina, [/&4rpages 71-84. ACMrjanuary 1993. 

[Pla92] P.J. Plauger. The Standard C Library. Prentice HalirEnglewood CliffsFNew Jersey 
07632ri992. 

[Plo81] G. D. Plotkin. A Structural Approach to Operational Semantics. Technical Report 
DAIMI FN-19r Computer Science Departmentr Aarhus Universityr Aarhusr Den- 
markrSeptember 1981. 

[Rey74] J. C. Reynolds. Towards a Theory of Type Structure. In Paris Colloquium on 
Programming^ volume 19 of Lecture Notes in Computer ScienceT pages 408-425. 
Springer- Verlagr 1974. 



185 



[Rob65] J. A. Robinson. A Machine-Oriented Logic Based on the Resolution Principle. 
Journal of the ACMT12(l):23-4iri965. 

[SJ90] Guy L. Steele Jr. Common Lisp: The Language. Digital PressFsecond editionri990. 

[TJ92] Jean-Pierre Talpin and Pierre Jouvelot. The Type and Effect Discipline. In Proceed- 

ings of the ACM Symposium on Logic in Computer ScienceTp&ges 162-173. ACM 
Pressri992. 

[Tof90] Mads Tofte. Type Inference for Polymorphic References. Information and Compu- 
tationT89:l-34T1990. 

[Tol94] Andrew Tolmach. Tag-free Garbage Collection Using Explicit Type Parameters. 
In Proceedings of the 1994 ACM Conference on Lisp and Functional ProgrammingT 
pages 1-11. ACM PressPIune 1994. 

[Tra86] Kenneth R. Traub. A Compiler for the MIT Tagged-Token Dataflow Architecture. 
Master's thesisr Laboratory for Computer ScienceF MITFCambridgeFMA 02139F 
August 1986. Available as Technical Report MIT/LCS/TR-370. 

[TT93] Mads Tofte and Jean-Pierre Talpin. A Theory of Stack Allocation in Polymorphi- 
cally Typed Languages. Technical Report 93/15FDepartment of Computer Science 
(DIKU)FCopenhagen UniversityF1993. 

[Tur85] D. A. Turner. Miranda: A non-strict functional language with polymorphic types. 
In Lecture notes in Computer 5oenceFvolume 201. Springer VerlagFSeptember 1985. 

[Wad90] Philip Wadler. Linear types can change the world! In Proceedings of the Working 
Conference on Programming Concepts and Methods, Tsrae/Tpages 385-407. North- 
HollandF1990. 

[WB89] Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad hoc. In 
Proceedings of the 1 6th A CM Symposium on Principles of Programming Languages, 
Austin, TexasTp&ges 60-76FJanuary 1989. 

[Wil92] Paul R. Wilson. Uniprocessor Garbage Collection Techniques. In Proceedings of the 
International Workshop on Memory Management, St. Malo, FranceT pages 1-42. 
Springer- VerlagFSeptember 1992. LNCS 637. 

[Wri92] Andrew K. Wright. Typing References by Effect Inference. In Proceedings of the 4th 
European Symposium on Programming, Rennes, FranceT pages 473-491. Springer- 
VerlagrFebruary 1992. Lecture Notes in Computer ScienceFvolume 582. 

[Wri93] Andrew K. Wright. Polymorphism for Imperative Languages without Imperative 
Types. Technical Report TR93-200rRice UniversityFFebruary 1993. 



186 



